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Chapter  1 

Introduction,  Approach,  and 
Progress  Overview 

1.1  Introduction 

This  report  presents  our  progress  to  date  on  the  DRDC  Autonomous  Sustain  and  Resupply 
project.  We  are  using  the  ASR  problem  as  the  key  motivating  problem  for  the  AI  and 
autonomy  research  in  our  lab. 

The  document  is  structured  as  follows.  First  we  state  our  general  approach,  which  is 
unchanged  from  the  proposal.  Then  we  summarize  our  progress  to  date,  referencing  detailed 
descriptions  of  our  work  provided  as  chapters  of  this  report.  Next  we  provide  the  research 
plan,  which  is  slightly  revised  from  the  version  in  our  proposal.  Next  is  the  literature 
review,  followed  by  our  formal  model  of  the  problem,  and  then  a  design  for  a  planning- 
based  solution.  The  remainder  is  a  series  of  chapters  describing  our  progress  during  Phase 
1  on  various  aspects  of  the  ASR  problem. 

1.2  Approach 

Considering  the  target  problem,  we  can  identify  three  main  sub- problems  that  are  conven¬ 
tionally  considered  in  isolation.  These  are: 

1.  Multi-robot  task  allocation,  preferably  distributed. 

2.  Point-to-point  path-planning  and  navigation  in  dynamic  environments  with  uncer¬ 
tainty. 

3.  Individual  robot  self- maintenance  for  long-term  operation. 

Each  of  these  components  is  a  mature  research  area  in  itself,  with  a  wealth  of  literature. 
For  example,  a  definitive  review  of  multi- robot  task  allocation  methods  is  given  in  [32]. 
Point-to-point  navigation  is  extremely  well-studied;  for  example  a  current  DARPA  project 
addresses  the  problem  of  fast  and  robust  navigation  in  outdoor  unstructured  environments. 
Each  problem  has  well-known  approaches  that  could  be  applied  to  the  target  task. 
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We  believe  that  the  novelty  and  research  value  of  this  project  is  in  the  integration  of 
these  components.  This  provides  two  research  questions:  (i)  what  are  the  precise  boundaries 
and  interfaces  between  the  components,  and  what  alternative  approaches  are  suitable  to  be 
combined  in  this  way?  (ii)  what  synergistic  effects  can  be  found  between  these  components, 
so  that  new,  task- specific  methods  can  be  designed  to  solve  the  task  more  efficiently  than 
by  a  naive  combination  of  previous  approaches? 

To  reflect  this  view,  we  are  following  the  following  methodology,  as  described  in  our 
proposal.  First  we  analyze  the  control  task  formally  and  review  the  relevant  literature  on 
the  component  tasks.  Then  we  engineer  a  “classical”  system:  a  modular  integrated  system 
using  the  most  appropriate  methods  from  the  literature.  This  engineering  exercise  has  two 
benefits;  it  satisfies  the  requirements  of  the  project  by  producing  a  working  system,  thus 
minimizing  the  research  risk  of  the  project,  and  it  gives  us  essential  experience  and  insight 
into  a  running  system.  Our  preference  is  for  decentralized,  distributed  methods  wherever 
possible,  for  their  advantages  in  robustness  through  redundancy,  and  scalability. 

Next,  we  attempt  to  deconstruct  the  classical  system,  creating  novel,  problem- specific 
techniques  to  create  a  “fast  and  frugal”  system.  Using  the  insights  gained  from  our  the 
classical  approach,  and  many  of  the  system  component  already  built,  we  will  attempt  to  de¬ 
vise  task- specific  heuristics  that  provide  increases  in  performance  and/or  scalability,  and/or 
system  cost  and  complexity. 

Throughout,  we  perform  principled  evaluations  of  the  system  performance,  and  attempt 
to  identify  the  cost /ben eht  trade-offs  in  our  design  decisions. 


1.3  Progress  in  this  period 

1.3.1  Personnel 

We  have  employed  a  team  students  from  the  Autonomy  Lab  to  work  on  the  project.  Two 
senior  Phd  students,  Jens  Wawerla  and  Yaroslav  Litus,  and  one  senior  MSc  student,  Adam 
Lein,  have  been  employed  to  date.  The  vast  majority  of  the  budget  is  allocated  to  paying 
student  salaries.  All  these  students  contributed  to  this  report. 

1.3.2  Literature  review 

We  have  reviewed  the  state  of  the  art  in  solving  the  target  problem.  The  key  references  are 
described  in  Chapter  4,  which  is  a  high-level  review.  More  specific  reviews  of  sub- topics  can 
be  found  at  the  beginning  of  each  of  the  research  chapters  at  the  end  of  the  report. 

1.3.3  Modeling  Adaptive  Sustain  and  Resupply 

We  have  developed  a  series  of  formal  models  of  the  ASR  problem,  and  Chapter  3  provides 
the  latest  iteration  of  these. 

1.3.4  Distributed  solutions  to  gradient  optimization  problems 

Robot  rendezvous  is  a  vital  part  of  the  ASR  problem,  and  we  have  been  developing  scalable 
methods  of  achieving  multi- robot,  multi- place  rendezvous  using  fast,  distributed  algorithms. 
We  have  made  considerable  progress  in  the  last  four  months,  and  Chapter  6  describes  our 
approach  in  detail. 
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1.3.5  Optimal  robot  recharging  strategies 

ASR  requires  robots  to  manage  their  energy  supplies  so  as  to  provide  continuous  operation. 
We  are  investigating  the  space  of  optimal  recharging  strategies,  and  have  some  interesting 
an  novel  results  described  in  Chapter  7. 

1.3.6  Adaptive  Foraging 

We  are  investigating  a  family  of  partial  ASR  solutions  based  on  models  of  foraging  behaviour 
in  animals.  Chapter  8  details  our  progress  in  this  period,  in  which  we  have  reproduced  and 
improved  upon  the  previously  most  successful  territorial  foraging  methods. 

1.3.7  Simulation  platform 

We  have  worked  extensively  on  the  Stage  multi- robot  simulator,  making  it  considerably 
more  useful  for  this  project.  The  aims  of  the  current  development  are: 

1.  Add  an  abstract  resource  generation,  transfer  and  consuming  model  to  Stage,  to  model 
the  movement  of  energy  and  other  resources  between  robots. 

2.  Add  a  matching  generic  resource  interface  to  the  Player  server. 

3.  Develop  Stage  models  that  approximate  the  target  robot  platform,  to  be  discussed 
with  DRDC.  The  simulation  experiments  will  be  performed  with  a  mixture  of  the 
DRDC  target  platform  and  models  of  our  locally  available  robots. 

4.  Performance  and  scaling  enhancements  to  ensure  that  Stage  easily  supports  the  target 
number  of  robot  models.  Our  internal  target  is  20  complex  (i.e.  laser  and/or  camera 
equipped,  with  wireless  communication  model)  robots  in  real  time  on  standard  PC 
hardware  in  90%  of  real-time  or  better,  and  200  robots  in  usable  but  slower- than- 
real-time  mode.  Key  performance  gains  will  be  obtained  by  improving  the  ray  tracing 
subsystem,  and  possibly  by  exploiting  SIMD  computation  on  GPU  hardware.  Minor 
performance  gains  will  be  obtained  by  profiling  and  refactoring  frequently- called  code. 

5.  Ongoing  bugfixes,  documentation  and  user  support  throughout  the  life  of  the  program. 

To  date  we  have  made  good  progress  on  point  1.  Figure  1.1  shows  a  screenshot  of  Stage 
running  a  resource-transport  demonstartion  shown  during  our  visit  to  DRDC  in  March 
2008.  The  abstract  resources  are  shown  as  coloured  “jewels”  and  are  generated,  exchanged, 
and  consumed  by  various  simulation  models.  The  red  mobile  robots  are  autonomously 
transporting  jewels  from  the  green  “source”  to  the  red  “sink”  location. 

We  have  greatly  exceeded  our  expectations  on  point  3  -  multi- robot  performance.  See 
Chapter  9  for  a  detailed  description  of  the  progress  on  improving  Stage’s  scalability. 

In  addition,  we  have  added  a  new  feature  to  Stage:  plug-in  control  modules  that  allow 
users  to  attach  arbitrary  compiled  programs  to  models.  Typically  this  programs  are  used 
to  control  robots,  but  they  can  also  be  used  to  run  automated  experiments,  logging,  visu¬ 
alization,  etc.  This  work  is  in  an  early  experimental  form  now,  but  it  is  functional:  the 
transportation  demo  described  above  uses  it  for  robot  controllers.  It  will  be  reported  in 
detail  in  future. 

Parallel  to  this  project  (and  funded  externally)  we  have  continued  our  efforts  with  ed¬ 
ucation  and  promotion  of  the  Player,  Stage  and  Gazebo  platforms  in  the  global  research 
community. 
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Figure  i.i:  Screenshot  fiom  the  current  development  version  of  Stage,  lowing  strict 
resource  model.  The  red  robots  -are  transporting  units  of  resource  (yellow  "jewels17)  fiom 
the  source  location  to  the  goal  location.  All  code  and  configuration  files  are  in  Stage  CVS 
HEAD. 


i.3.3  Real  robot  platform 

Our  custom-designed  Chatterbox  robot  prototype  is  complete  see  Figure  i.2  and  we  are 
in  a  testing  phase.  Onre  the  robot  has  been  £iown  to  be  reliable,  we  will  manufacture 
■iO40  robots,  depending  on  the  final  cost  Robot  construction  is  not  funded  by  this  project^ 
though  consumables  such  as  batteries  are  funded.  The  complete  population  vrill  be  ready 
for  use  probably  by  the  end  of  summer  3103,  and  in  case  by  Christmas  2003^  in  plenty  of 
time  for  Phase  2  real- robot  experiments  and  demonstrations. 


r 


Figure  1,2:  Autonomy  Lab  Chatterbox  robot  pre-production  prototype. 


Figure  1,8:  Autonomy  Lab  Chatterbox  robot  integration  board  pre-production  prototype. 
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Chapter  2 

Research  Plan 


2.1  Schedule  and  milestones 

Our  research  plan  is  modified  only  slightly  from  our  proposal,  with  no  financial  or  deliverable 
changes,  and  as  can  be  seen  from  the  progress  report,  we  are  currently  on  schedule. 

Our  effort  is  structured  as  5  complementary  tasks.  After  the  initial  review  and  formal¬ 
ization  exercise,  tasks  2  and  3  are  sequential,  while  tasks  4  and  5  run  in  parallel  with  the 
others. 

1.  Review  of  State  of  the  Art,  formal  modeling,  and  design  of  research  plan  (complete) 

2.  Engineering  a  “classical”  system  that  performs  the  task,  including  complete  robot 
controller  implementation  and  evaluation  experiments. 

3.  Designing  and  engineering  a  “fast  and  frugal”  solution,  including  complete  robot  con¬ 
troller  implementation  and  evaluation  experiments. 

4.  Infrastructure  software  development:  Player  and  Stage  modifications  to  support  the 
project 

5.  Local  project  management  and  DRDC  interaction 

Each  task  is  described  in  the  sections  below.  Table  2.1  shows  the  proposed  schedule  of 
milestones  and  deliverables,  by  task.  A  deliverable  of  “source,  docs,  demo”  indicates  the 
provision  to  DRDC  buildable  source  code,  documentation  and  examples,  and  a  runnable 
demonstration.  In  addition  to  these  milestones  and  deliverables,  progress  reports  will  be 
delivered  as  required  by  DRDC,  and  academic  papers  published  at  the  earliest  opportunity. 

Task  1:  Review  of  State  of  the  Art,  problem  formalization,  and 
design  of  research  plan 

Completed. 
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Task 

Description 

Deliverable 

Due 

Mo/Yr 

Phase  1 

Research  State  of  the  Art 

l.A 

Report  on  state  of  the  art 

3/08 

l.B 

Revised  Technical  Approach 

report 

3/08 

Phase  2 

Simulation 

Player/Stage  development 

4.A 

Stage  resource  model 

6/08 

4.B 

Player  resource  interface 

Classical  system 

Source,  docs,  demo 

6/08 

2.A 

Single-robot  resupply  &  recharge 

7/08 

2.B 

Multi-robot  (5)  resupply  &  recharge 

8/08 

2.C 

Multi-robot  (50)  resupply  &  recharge 

Source,  docs,  demo 

9/08 

to 

b 

Real-robot  reality  check 
CHECKPOINT  1 

Fast  and  Frugal  system 

9/08 

3.A 

Multi-robot  (5)  resupply  &  recharge 

2/09 

3.B 

Multi-robot  (50)  resupply  &  recharge 

Source,  docs,  demo 

3/09 

3.C 

Real-robot  reality  check 

Evaluation 

3.D 

Simulation  evaluation 

report 

3/09 

CHECKPOINT  2 

3/09 

Phase  3 

Real  Robots 

Classical  System 

3.E 

Multi-robot  (5)  resupply  &  recharge 

7/09 

3.F 

Multi-robot  (20)  resupply  &  recharge 

8/09 

CHECKPOINT  3 

Fast  and  Frugal  system 

9/09 

3.G 

Multi-robot  (5)  resupply  &  recharge 

11/09 

3.H 

Multi-robot  (20)  resupply  &  recharge 
Evaluation 

Source,  docs,  demo 

1/10 

3.1 

Real-world  evaluation 

2/10 

3.J 

FINAL  REPORT 

report 

3/10 

Table  2.1:  Schedule  of  tasks  and  deliverables.  This  report  ends  Phase  1  and  we  move 
Phase  2 


Task  2:  “Classical”  robot  system 

All  personnel  will  contribute  to  the  engineering  and  testing  of  a  multi- robot  system  that 
performs  the  target  task.  The  system  is  composed  of  components  described  in  the  planning 
review  above,  with  the  following  requirements: 

1.  Task  Allocation:  a  subsystem  that  dynamically  allocates  incoming  job  requests  to 
robots  in  an  efficient  manner. 

2.  Transportation:  once  given  a  resupply  task,  a  robot  must  rendezvous  with  the 
source  of  supplies,  transfer  supplies,  then  rendezvous  with  the  sink  of  supplies.  The 
environment  is  assumed  to  be  at  least  partially  unmapped,  dynamic  and  hazardous. 
Robots  are  unreliable  and  may  not  complete  a  rendezvous,  abandoning  their  task. 

3.  Individual  consumable  management:  each  robot  must  maintain  its  own  supply 
of  consumables  to  ensure  long-duration  operations.  The  basic  practical  version  of  this 
is  that  each  robot  must  ensure  that  its  battery  (or  fuel)  never  runs  out,  by  visiting  a 
recharger  (or  fuel  station/ tanker).  If  indefinite  operation  is  required,  recharging  must 
occasionally  occur,  temporarily  taking  priority  over  other  tasks. 

We  have  previously  built  and  published  descriptions  of  working  multi- robot  systems  that 
perform  point- to  point  transportation  and  recharging,  as  outlined  in  Chapter  4. 

Robot  controller  implementation  and  evaluation 

Control  strategies  devised  in  the  project  will  be  built  into  robot  controllers,  implemented  to 
target  the  Player  robot  server.  Our  approach  is  to  decouple  the  strategic  decision  making 
from  the  details  of  low-level  navigation  (except  where  any  low-level  information  can  be 
directly  exploited,  as  happens  for  example  in  synergistic  ant  trail  following).  Examples  of 
basis  behaviours,  to  be  used  as  building- blocks  for  high  level  planning  include  go-to-location, 
rendezvous -with-  other-robot ,  explore-  to- recov  er-  communications-  link,  fin  d-  n  earest- recharger. 

Evaluation  will  be  based  on  a  series  of  simulation  experiments  (in  Phase  2)  and  real- robot 
experiments  (in  Phase  3),  and  will  use  the  metrics  described  above.  We  aim  to  demonstrate 
first  a  single  resupply  robot,  then  5,  then  50  robots.  The  large  population  will  test  the 
scalabilty  of  our  approach. 

Task  3:  “Fast  and  Frugal”  robot  system 

All  personnel  will  contribute  to  the  design  and  engineering  of  revised  robot  controllers. 
In  this  task  we  build  on  the  classical  system,  and  attempt  to  improve  its  performance, 
reliability,  scalability  and  cost.  We  have  previously  been  successful  at  creating  very  efficient, 
high-performance  multi- robot  systems,  essentially  by  considering  the  target  problem  at  two 
levels  -  a  formal  optimization  problem,  and  a  real-time  robot  behaviour  generation  problem 
-  simultaneously.  Essentially  our  approach  is  to  always  remember  that  the  goal  of  the 
optimization  system  is  eventually  to  make  the  robots  move  to  achieve  work.  We  seek  to 
get  the  robots  working  early  and  fast,  by  pushing  computation  as  far  as  possible  down  to 
individual  robots,  without  centralized  control,  and  by  exploiting  local  sensing  as  much  as 
possible  to  avoid  the  expense  of  maintaining  large  internal  representations.  This  is  essentially 
the  application  of  Brooks7  “the  world  is  its  own  best  model77  advice,  or  the  related  principle 
of  “world- embedded  computation77  described  by  Payton. 
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A  thorough  analysis  of  target  task,  including  experience  with  an  implemented  system  is 
required  before  a  detailed  plan  for  this  task  is  possible.  This  is  the  area  of  highest  risk,  but 
potentially  greatest  scientific  contribution  in  this  proposal. 

Task  4:  Player  and  Stage  software  infrastructure  development 

Player  and  Stage  require  some  minor  extensions  for  the  proposed  research  and  experiments. 
These  include: 

*  Add  an  abstract  resource  generation,  transfer  and  consuming  model  to  Stage,  to  model 
the  movement  of  energy  and  other  resources  between  robots. 

*  Add  a  matching  generic  resource  interface  to  the  Player  server. 

*  Develop  Stage  models  that  approximate  the  target  robot  platform,  to  be  discussed 
with  DRDC.  The  simulation  experiments  will  be  performed  with  a  mixture  of  the 
DRDC  target  platform  and  models  of  our  locally  available  robots. 

*  Performance  and  scaling  enhancements  to  ensure  that  Stage  easily  supports  the  target 
number  of  robot  models.  Our  internal  target  is  20  complex  (i.e.  laser  and/or  camera 
equipped,  with  wireless  communication  model)  robots  in  real  time  on  standard  PC 
hardware  in  90%  of  real-time  or  better,  and  200  robots  in  usable  but  slower- than- 
real-time  mode.  Key  performance  gains  will  be  obtained  by  improving  the  ray  tracing 
subsystem,  and  possibly  by  exploiting  SIMD  computation  on  GPU  hardware.  Minor 
performance  gains  will  be  obtained  by  profiling  and  refactoring  frequently- called  code. 

*  Ongoing  bugfixes,  documentation  and  user  support  throughout  the  life  of  the  program. 

As  a  crucial  dependency  for  other  work,  this  task  began  immediately  on  project  start, 
and  will  be  largely  complete  by  9/2008,  though  ongoing  maintenance  and  improvements  will 
be  necessary.  All  personnel  will  work  on  this  task. 

Parallel  to  this  project  (and  funded  externally)  we  will  continue  our  efforts  with  education 
and  promotion  of  the  Player,  Stage  and  Gazebo  platforms  in  the  global  research  community. 

Task  5:  Local  project  management  and  DRDC  interaction 

The  project  management  plan  is  described  in  Section  2  of  the  proposal  and  is  unchanged. 

2.1.1  Checkpoints 

In  addition  to  the  schedule  of  tasks  and  deliverables,  we  have  marked  three  intermediate 
checkpoints,  at  1  year,  18  months  and  2  years.  At  each  checkpoint  we  will  perform  a  review 
exercise  to  assess  if  we  are  on  track  to  reach  the  project  goals,  to  identify  and  address  any 
issues  that  are  discovered,  and  to  evaluate  our  work  in  the  wider  research  context.  The 
checkpoints  are  intended  to  be  an  internal  sanity  check  and  have  no  deliverables. 

2.2  Deliverables 

The  main  deliverables  will  be: 
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1.  A  report  on  the  state  of  the  art,  and  a  revised  work  plan  and  Statement  of  Work,  by 
31/3/2008. 

2.  A  live  or  video  demonstration  of  a  software  system  that  performs  the  target  task  in  a 
standard  robot  simulation,  plus  all  source  code,  documentation,  simulation  configura¬ 
tions  and  supplementary  hies,  by  31/3/09. 

3.  A  live  or  video  demonstration  of  a  hardware  and  software  system  that  performs  the 
target  task  on  real  robots  that  model  DPDC’s  target  robots,  plus  all  source  code, 
documentation  and  supplementary  hies,  by  31/3/10. 

4.  A  hnal  report  including  technical  descriptions  and  critical  evaluations  of  all  compo¬ 
nents,  by  31/3/10. 

We  will  also  deliver  Additional  demonstrations  identihed  in  table  1.1,  along  with  the 
code,  models,  etc.  used  to  perform  each  demo. 

2.2.1  Requirements  for  software  deliverables 

All  robot  control  software  deliverables  will: 

1.  target  the  well-known,  Open  Source,  Player  robot  control  platform.  Player  is  the 
most-used  platform  in  robotics  research  and  education. 

2.  be  written  in  a  programming  language  common  in  the  community,  such  as  C,  C++ 
or  Python. 

3.  come  with  all  Make  hies,  or  equivalent  met  a- level  conhguration  hies  for  tools  such  as 
automake  or  scons,  or  both. 

4.  be  demonstrated  to  work  the  reference  platform  of  Linux  Fedora  Core  5  or  higher, 
though  it  will  be  developed  on  a  variety  of  platforms  and  is  therefore  likely  to  be  very 
portable. 

5.  be  thoroughly  documented  in-line  with  D oxygen. 

6.  be  delivered  with  a  complete  version  control  history  database,  in  CVS  or  SVN. 

7.  depend  only  on  well-known  Free  or  Open  Source  Software  components,  compilers,  etc. 

These  requirements  are  standard  practice  for  all  Autonomy  Lab  projects  and  the  Player 
Project. 

2.2.2  Requirements  for  simulation  model  deliverables 

The  simulation  platform  will  be  the  well-known,  Open  Source,  Stage  multi- robot  simulator, 
of  which  Vaughan  is  the  lead  author  and  maintainer.  Stage  runs  on  the  reference  Linux 
platform,  and  many  others.  All  models  and  simulation  set-ups,  environments,  robot  con¬ 
figurations,  etc,  used  for  experiments  will  be  provided  as  text  hies  and  will  work  with  the 
mainstream  release  of  Player  and  Stage. 

As  co- founder  and  co designer  of  Player,  Vaughan  has  more  than  7  years  of  experience 
developing  and  experimenting  using  Player  and  Stage,  and  all  Autonomy  Lab  students  are 
trained  in  using  and  extending  P/S  for  their  experiments. 
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Chapter  3 


Modeling  Autonomous  Sustain 
and  Resupply 

3.1  Modeling  the  problem 

Our  first  step  is  to  propose  a  formal  model  of  the  problem  which  captures  the  essential 
nature  of  the  informal  description  given  in  the  contract,  compactly  but  without  losing  gen¬ 
erality.  We  also  aimed  to  have  a  principled  metric  for  evaluating  our  solutions.  This  process 
was  considerably  more  difficult  time  consuming  than  we  anticipated,  but  increased  our  un¬ 
derstanding  of  the  problem  considerably. 

Intuitively,  the  model  states  that  we  have  a  set  of  resource  types  and  a  population  of 
mobile  agents,  each  of  which  has  an  individual  resource  carrying  capacity,  and  a  function 
that  describes  how  the  resource  load  changes  over  time.  The  function  depends  on  the  agent’s 
position,  speed,  and  the  current  resource  load.  If  an  agent  has  one  or  more  resources  that 
increase  over  time,  the  agent  is  a  producer.  If  an  agent  has  one  or  more  resources  that 
decrease  over  time,  the  agent  is  a  consumer.  The  consumption  of  some  resources  by  some 
agents  generates  value ,  and  the  system  goal  is  to  maximmize  the  total  value  generated  over 
time.  Other  resources  (e.g.  energy  in  the  form  of  battery  charge)  are  consumed  but  do  not 
generate  value  -  they  are  the  system  overhead.  Agents  can  meet  in  pairs  to  trade  resources, 
and  each  pair  has  a  maximum  rate  at  which  each  resource  type  can  be  exchanged.  The 
system  goal  is  to  generate  motion  plans  for  each  robot  such  that  the  resources  that  exist 
in  the  system  at  startup,  or  are  subsequently  generated,  are  consumed  to  create  the  largest 
possible  value. 

Using  this  simple  but  powerful  scheme,  we  can  represent  a  wide  variety  of  producer- 
transporter- consumer  scenarios  simply  by  selecting  the  appropriate  resources,  resource- 
varying  functions  and  other  parameters.  The  following  is  the  formal  description  of  the 
model. 

3.1.1  Formal  model 

Resources  There  are  m  resource  types  indexed  by  j. 
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Agents  There  are  n  heterogeneous  agents  which  include  all  consumers,  producers  and 
transporters.  Agents  are  indexed  by  z.  They  could  be  seen  as  autonomous  point- agents  with 
spatial  position  of  agent  z  denoted  by  Ti(t)  e  M2  and  dynamics  Ti(t)  =  Vi(t ),  |fy$fy)||  <  Ify 
Stationary  agents  (fixed  sites)  have  Vi  =  0.  Agent  z  has  a  stock  of  gi{i)  =  (gij(t))  amount 
of  resource  j  at  time  t.  At  all  time  the  load  of  agent  should  satisfy  load  validity  predicate 
Gi ,  i.e.  Vi  (<?$(£)).  A  pair  of  agents  can  exchange  resources  and  the  maximum  exchange 
speed  for  agents  z,  k  and  resource  j  during  any  exchange  is  Xikj  =  Xkij . 

The  stock  of  resources  evolves  as  resources  are  produced  and  consumed  by  agents  and  the 
speed  fij(ri(t),ri(t),gi(t))  of  production  or  consumption  depends  on  the  current  location, 
speed  and  resource  stock  of  an  agent. 

Exchanges  Every  agent  has  an  associated  set  Ei  of  exchanges  with  other  agents.  An 
exchange 

e  =  (ts,te,p,  Ag),e  e  Eu 

where  ts  is  the  start  time,  te  is  the  end  time,  p  is  the  index  of  an  exchange  partner  agent 
and  Ag  is  the  change  vector  should  satisfy  the  following  conditions: 

Vi  G  r(e)  :  rfyi)  =  rp(t)  =  rfyfy),  where  r(e)  =  [is,ie).  (partners  are  collocated  and  immobile) (3.1) 

m 

te  >  is  +  (\Agj\Xipj) ,  (feasibility  of  exchange  duration) (3. 2) 
3= 1 

Vj  :  gij(te)  =  9ij(ts)  +  A gj  +  fi;j(0,ri(t),gi(t))dt,  (resource  accounting)(3.3) 

Gi(gi(te)),  (validity  of  load  after  exchange) (3. 4) 
3ef  e  Ep  :  ef  =  (fy,fy,z,  —  A^),  (reciprocity  condition) (3. 5) 

All  exchanges  should  be  temporally  disjoint:  r\e^r(x)  =0. 

Resource  changes  outside  exchanges  If  an  agent  is  not  participating  in  exchange,  his 
resource  stock  changes  as: 

Mt  £  U x€EiT(x)  :  g^t)  =  hj{r &),?&),<}&)),  (3.6) 


3.2  Performance  metrics 

Gross  system  value  Assuming  every  unit  of  resource  j  produced  or  consumed  by  agent 
z  has  a  current  value  of  F \j  the  ad-hoc  gross  system  value  of  the  system  working  from  i0  fill 
i  and  calculated  at  io  is 

n  rn  ^ 

GSVto(t0,T)=Y,EPv  (3.7) 

1=1  j=l 

where  (3  is  a  discount  factor.  Discounting  is  used  to  model  the  descreasing  value  of  work 
done  over  time. 

We  can  also  calculate  the  post-hoc  value  at  T : 

n  rn  ^ 

GSVT(t0,T)  =  J2EPiG  (3.8) 

i=lj=l  ^*0 
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We  can  normalize  this  performance  by  dividing  it  by  the  theoretically  possible  maximum. 
The  maximum  can  be  achieved  if  resources  in  the  system  can  be  momentarily  exchanged 
between  the  agents  to  maximize  the  resulting  discounted  value  as  if  the  system  has  an  ideal 
instantaneous  optimal  resource  transportation  system: 


n  ra  ^ 


GSVt™x(t0,T ) 

=  max  y'V'Pjj  /  fij(ri(t),ri(t),rni(t))e  ^dt,  s.t. 
u,(t),<x(t)  “  Jto 

(3.9) 

8 

II 

(3.10) 

n 

Vi  :  =  1 

2=1 

(3.11) 

where  otij(€)  is  the  share  of  total  resource  j  available  in  the  system  available  for  agent  i 
at  moment  t  and  rriij(t)  gives  the  corresponding  resource  amount.  CSV™** (to,  T)  can  be 
defined  similarly. 

System  value  index 

We  dehne  normalized  performance  indices  as 

PltofaT) 


GSVto(t0,T ) 
GSV£**(to,T) 


(3.12) 


PIr(t o,T)  can  be  defined  similarly. 

After  extensive  consideration  of  the  model  and  metrics,  we  propose  this  metric  rather 
than  the  simpler  measures  of  throughput  and  efficiency  that  appeared  in  our  proposal,  as 
it  better  captures  the  true  “goodness”  of  the  system,  expressed  in  terms  of  the  total  time- 
dependent  value  of  the  resources  used  by  the  consumers. 
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Chapter  4 


Related  Work 


This  chapter  presents  a  high-level  review  of  the  state  of  the  art  in  aspects  of  the  ASR 
problem.  Each  of  the  later  chapters  of  the  report  contains  its  own  self-contained  review  of 
specific  aspects  of  the  problem  studies  during  this  period. 


4.1  Planning 

Most  candidate  solutions  to  resource  transportation  and  delivery  problems  rely  on  planning 
as  a  core  component.  The  problem  is  expressed  in  terms  of  finding  in  advance  a  sequence 
of  actions  that  can  be  executed  to  satisfy  a  set  of  orders  in  the  best  possible  way,  i.e. 
minimizing  an  a  priori  cost  function.  Simple  resource  delivery  problems  are  among  the  set 
of  the  standard  toy  problems  used  to  illustrate  various  planning  approaches  and  alogrithms 
[68]- _ 

Single-agent  discrete  time  planning  is  directly  applicable  to  solving  resource  delivery 
problems  with  a  single  transporter,  discrete  resources  and  a  set  of  static  producers  and 
consumers.  A  good  survey  of  the  classical  planning  system  STRIPS  sufficient  for  solving 
this  class  of  problem  is  given  in  [25]. 

Discrete  time  representation  is  not  appropriate  in  many  cases,  and  temporal  planning 
is  needed  to  succesfully  solve  problems  with  durative  actions  and  resource  utilization.  One 
special  case  of  temporal  planning  is  job  shop  scheduling  which  is  applicable  in  cases  where 
the  partial  order  on  the  sequence  of  actions  is  known.  A  compact  but  thorough  survey  of 
various  techniques  for  job  shop  scheduling  is  given  in  [46] .  Some  problems  could  be  solved 
by  hybrid  planning  and  scheduling  systems  such  as  ISIS  described  in  [28].  High-level  action 
languages  capable  of  operating  with  time  and  resources  were  developed  as  well,  but  usually 
planners  require  domain  specific  heuristics  to  operate  with  them  [47,  30] . 

Introducing  several  agents  capable  of  parallel  operation  gives  another  dimension  to  the 
planning  problem.  Centralized  planning  for  multiple  agents  (multibody  planning)  can  be 
performed  by  using  partial  order  planning  extended  to  handle  action  concurrency  [13].  The 
latest  version  of  Planning  Domain  Description  Language  ,  PDDL  2.1  [27]  can  model  metric 
time  and  continous  resource  utilization.  Several  planners  that  accept  problems  described  in 
PDDL  2.1  were  created  and  are  freely  available  [23,  37]. 

Another  aspect  of  mu ti- agent  planning  is  the  possibility  to  distribute  not  only  plan  exe¬ 
cution,  but  the  planning  itself  between  different  agents.  Different  coordination  mechanisms 
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exist  to  ensure  the  consistency  of  resulting  plan.  Weiss  [94]  provides  a  comprehensive  re¬ 
view  of  coordination  mechanisms  and  other  aspects  of  multi- agent  planning.  Wilkins  [95] 
describes  a  complete  high-level  system  architecture  for  distributed  planning.  MAPL  frame¬ 
work  extends  PDDL  with  qualitative  temporal  relations  (capturing  actions  with  uncertain 
duration)  and  allows  syncronization  on  communicative  acts  [14],  however  no  planner  imple¬ 
mentations  of  MAPL  problems  are  currently  available. 

4.2  Multi- Robot  systems 

The  ASR  problem  is  explicitly  stated  as  a  multi-robot  system  problem.  The  ultimate  goal 
is  to  produce  robot  controllers  that  run  on  each  of  a  group  of  robots,  such  that  the  overall 
system  behaviour  results  in  resources  being  distributed  appropriately.  In  addition  to  the 
robot  controllers,  we  may  have  other  programs  running  on  ancillary  computers  to  manage 
coordinattion  and  planning.  Hence  we  necessarily  have  a  distributed  system  of  robots, 
loosely  coupled  through  their  interactions  in  the  world,  but  also  explicitly  coupled  through 
communicat  ion . 

There  is  an  extensive  literature  and  research  community  in  multi- robot  systems.  The 
canonical  survey  is  [86],  though  this  pre-dates  a  lot  of  interesting  work.  Possibly  the  earliest 
study  of  interacting  electronic  robots  was  by  Walter,  who  describes  the  complex  interactions 
of  his  robots  Elsie  and  Elemer,  controlled  by  “synthetic  brains” :  analog  valve  circuits  mod¬ 
eled  on  neurons  [92].  Multi-robot  systems  are  of  interest  to  researchers  for  several  reasons, 
each  of  which  is  relevant  to  ASR: 

1.  producing  robust  systems  due  to  redundancy  -  unlike  a  single-robot  system,  multi¬ 
robot  systems  can  potentially  survive  the  failure  of  one  or  more  robots. 

2.  scaling  up  -  several  robots  can  do  more  work  than  one,  though  there  are  limits  to 
scalability  in  many  scenarios  due  to  interference. 

3.  heterogeneous  systems  -  in  some  scenarios  it  may  be  easier  or  cheaper  to  have  various 
specialized  robots  instead  of  one  do  anything  robot. 

4.  emergent  behaviour  -  the  coupled  parallel  activity  of  many  simple  agents  can  have  sur¬ 
prisingly  powerful;  the  complex  structures  assembled  by  ants  and  bees  with  relatively 
small  nervous  systems  and  without  central  control  are  the  classic  examples  [41].  This 
idea  is  informally  familiar  as  the  “more- than- the- sum- of- its- parts”  idea. 

5.  competition  -  competition  can  be  a  useful  optimization  mechanism;  for  example  as 
demonstrated  by  natural  selection.  Mult-robot  systems  are  a  natural  platform  for 
competition,  as  well  as  exploit  cooperation. 

4.2.1  Task  Allocation 

A  central  part  of  the  ASR  problem  is  assigning  transport  tasks  to  individual  robots.  This 
is  an  instance  of  the  formal  multi-robot  task  allocation  (MRTA)  problem,  which  has  been 
the  subject  of  many  articles,  but  as  Gerkey  argues  [33]  typically  using  ad-hoc  approaches. 
Gerkey  provides  a  useful  taxonomy  of  MRTA  problems,  obtained  from  combining  these  basic 
properties: 
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1.  single-task  robots  (ST)  vs.  multi-task  robots  (MT):  ST  means 
that  each  robot  is  capable  of  exe-  cuting  as  most  one  task  at  a  time,  while 
MT  means  that  some  robots  can  execute  multiple  tasks  simulta-  neously. 

2.  single-robot  tasks  (SR)  vs.  multi-robot  tasks  (MR):  SR  means  that 
each  task  requires  exactly  one  robot  to  achieve  it,  while  MR  means  that 
some  tasks  can  require  multiple  robots. 

3.  instantaneous  assignment  (I A)  vs.  time-extended  assignment  (TA): 

I A  means  that  the  available  infor-  mat  ion  concerning  the  robots,  the  tasks, 
and  the  en-  vironment  permits  only  an  instantaneous  allocation  of  tasks 
to  robots,  with  no  planning  for  future  alloc  a-  tions.  TA  means  that  more 
information  is  available,  such  as  the  set  of  all  tasks  that  will  need  to  be  as¬ 
signed,  or  a  model  of  how  tasks  are  expected  to  arrive  over  time. 

Problems  are  categorized  by  a  3- tuple  containing  one  of  each  of  these  pairs  of  identifiers: 

For  example,  a  problem  in  which  multi-robot  tasks  must  be  allocated  once 
to  single-task  robots  is  designated  ST-MR-IA. 

Unfortunately,  the  general  ASR  problem  can  be  any  combination  of  these,  depending 
on  the  exact  constraints  imposed  on  the  particular  system.  Therefore  the  analysis  of  the 
differences  between  these  catagories  provied  by  Gerkey  is  not  immediately  useful,  except  to 
elimate  some  approaches  that  he  shows  to  apply  only  to  particular  combinations.  However, 
this  work  is  very  useful  for  understanding  task  allocation,  and  surveys  and  categorizes  the 
previous  work  in  the  area. 

Gerkey  also  provides  a  very  useful  result  in  his  PhD  thesis  [32],  showing  that  optimal 
ST- SR- 1 A  allocation  can  be  achieved  using  the  Hungarian  Method;  well  known  in  other 
fields  but  introduced  to  this  domain  by  Gerkey.  This  result  renders  many  other  approaches 
obsolete,  for  example  various  sub-optimal  auction-based  methods  which  have  been  popular 
in  the  literature.  Gerkey7 s  thesis  is  currently  the  authoritative  survey  of  static  task  allocation 
methods,  while  a  later  article  provides  analysis  of  dynamic  allocation  [54]. 

4.2.2  Rendezvous 

To  perform  resource  exchanges,  two  robots  must  rendezvous,  i.e.  be  colocated  in  space  and 
time  for  some  non- zero  duration. 

The  problem  of  path  planning  for  multiple  robots  to  assemble  in  one  location,  the  ren¬ 
dezvous  problem ,  has  already  received  some  attention.  Most  authors  consider  the  setting 
where  robots  have  incomplete  information  about  locations  of  each  other  which  makes  it 
difficult  to  coordinate  and  agree  on  a  single  meeting  point  [24,  21,  45,  70].  For  example,  [3] 
describes  and  proves  a  distributed  rendezvous  algorithm  for  robots  with  limited  visibility. 
Other  authors  concentrate  on  selecting  a  meeting  point  to  ensure  that  some  specific  proper¬ 
ties  hold,  or  to  optimize  the  formation  during  the  convergence.  Lanthier  et.  al.  [52]  present 
an  algorithm  for  finding  the  meeting  point  which  minimizes  the  maximum  individual  travel 
costs  to  a  single  meeting  point  on  a  weighted  terrain.  Smith  et.  al.  [77]  describe  a  scheme 
which  makes  the  convergence  process  more  organized  in  a  certain  mathematical  sense. 

Our  group  has  done  some  recent  work  on  this  problem.  In  [100,  98]  Zebrowski  and 
Vaughan  introduced  robot- to- robot  recharging,  whereby  a  dedicated  Tanker7  robot  trans¬ 
ports  energy  from  a  fixed  charging  station  and  transfers  it  to  a  worker  robot  by  physical 
docking. 


19 


In  [99]  Zebrowski,  Lit  us  and  Vaughan  examine  the  problem  of  multiple  robots  finding 
a  single  unique  meeting  point  that  minimizes  their  weighted  travel  costs.  This  turns  out 
to  be  an  interesting  classical  problem  (the  Generalized  Fermat-Torricelli  Problem)  with 
no  closed  form  solution.  We  introduced  two  specialized  numerical  methods  to  find  good- 
quality  approximate  solutions,  and  also  a  perfectly  scalable,  local- gradient- descent  method 
that  provably  achieves  the  rendezvous  in  time  proportional  to  the  distance  between  the 
furthest  pair  of  robots.  In  that  paper  we  assume  perfect  shared  knowledge  about  the  robot 
locations  and  shows  an  effective  algorithm  which  minimizes  total  energy  spent  by  the  robot 
team  in  performing  their  rendezvous.  This  problem  is  different  from  that  solved  by  [52] ,  who 
instead  minimized  the  maximum  energy  cost  to  any  individual  robot.  Our  scheme  optimizes 
energy  efficiency ,  which  is  critical  for  some  applications. 

We  characterize  the  rendezvous  problem  as  follows: 

Assume  n  robots  are  located  at  positions  i  =  1  . . .  n.  When  a  robot  moves,  it  expends 
energy  proportional  to  the  length  of  its  trajectory.  Robots  have  individual  energy  costs  c$ 
per  unit  of  traveled  distance,  thus  if  robot  i  moves  from  a  to  fr,  it  spends  c$||a  —  b\\  units 
of  energy.  Now  the  task  is  to  find  a  point  p*  which  minimizes  the  total  energy  spent  by  all 
robots  for  meeting  at  that  point: 


n 

V*  =  argmin  Wcj||p- n\\  (4.1) 

Q  A - / 

2=1 

Problem  (6.1)  is  known  very  well  in  optimization,  where  it  belongs  to  the  family  of  facility 
location  problems.  Historically,  it  has  a  variety  of  names  including  the  Fermat-Steiner  prob¬ 
lem,  Weber  problem,  single  facility  location  problem,  and  the  generalized  Fermat-Torricelli 
problem.  Though  it  is  not  possible  to  find  a  closed-form  solution  for  this  problem,  the 
properties  of  its  solutions  are  well  known  (see  [36]  for  the  case  n  =  3  and  [51]  for  the  general 
case).  Effective  numerical  algorithms  exist  [67].  Interestingly,  it  is  also  possible  to  describe 
the  solution  using  a  mechanical  interpretation  as  a  system  of  idealized  strings,  pulleys  and 
weights  [66]. 

We  will  now  briefly  state  the  properties  of  the  solution  point.  First,  p*  obviously  can 
not  be  located  outside  the  convex  hull  formed  by  Second,  if  points  are  not  colinear 
(not  lying  upon  a  straight  line),  the  goal  function  in  (6.1)  is  strictly  convex,  which  ensures 
the  uniqueness  of  p*  if  n  >  3.  Finally,  it  is  possible  to  prove  the  following  theorem  [51]. 

Theorem  1.  If  ri  are  not  colinear  and  for  each  point  ri 

IIX>A — All  >n,i^3  (4.2) 

£ — /  \\ri  —  rp 
j=1  3  *11 

then  p*  f  Vi  for  any  i  and  G  ||p*I^j|  —  0  (the  floating  case).  If  (6.3)  does  not  hold  for 

some  ri  then  p*  =  ri  (the  absorbed  case). 

Intuitively,  in  the  floating  case  all  robots  meet  at  some  point  which  is  not  the  starting 
point  of  any  robot:  all  robots  move.  In  the  absorbed  case,  one  robot  stays  still  and  the 
others  drive  to  it. 

If  ri  are  colinear,  then  there  may  exist  more  than  one  point  where  minimum  energy  cost 
is  incurred  and  this  method  may  fail.  This  is  not  a  significant  practical  problem  for  two 
reasons.  First,  in  most  real  world  situations,  the  robots  are  unlikely  to  be  perfectly  aligned 
(with  the  exception  of  1-degree  of  freedom  robots  such  as  trains).  Second,  even  in  the 
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colin ear  case  the  “local  dynamic”  method  presented  below  converges  to  a  unique  instance 
of  the  possible  minima. 

Multi-robot  multi-place  rendezvous:  Frugal  Feeding 

Having  proposed  an  heuristic  solution  to  the  rendezvous  problem,  we  introduce  and  tackle 
the  much  harder  problem  in  [57]  whereby  a  single  tanker  or  service  robot  must  visit  each 
worker  robot,  but  not  necessarily  at  the  same  location,  while  minimizing  the  travel  cost  of 
the  whole  team:  the  Frugal  Feeding  Problem. 

The  Ordered  Frugal  Feeding  Problem  can  stated  formally  as  follows: 

Definition  1  (Ordered  Frugal  Feeding  Problem).  Given  tanker  location  po  e  worker 

locations  ri  e  Rd,  i  =  1, . . .  ,  k  and  locomotion  cost  functions  Gi  :  Hd  = 

find 

k 

min  V]  (Co(pi-uPi)  +  Cfir^pi));  (4.3) 

pi,P2,...,Pk  L ' 

2=1 

Ci(x,y)  =  Wi\\y  -  x\\  (4.4) 

Here  Co(x,y)  gives  the  cost  of  tanker  relocation  from  x  to  y,  C\(x,  y)  gives  the  corresponding 

cost  for  worker  i,  and 

Definition  (4)  could  be  amended  to  require  the  tanker  to  return  to  its  original  location 
after  attending  all  workers,  perhaps  to  refuel  itself,  without  affecting  the  following  analysis. 

We  denote  solution  points  as  p|  The  possibility  of  several  robots  being  attended  in  one 
place  is  permitted,  and  captured  by  the  possible  coincidence  of  some  meeting  points. 

The  Fermat-Torricelli  problem  (also  called  the  Steiner- Weber  problem)  asks  for  the 
unique  point  x  minimizing  the  sum  of  distances  to  arbitrarily  given  points  xi,..  .  ,xn  in 
Euclidean  d-dimensional  space.  Elaborating  on  the  conventions  in  the  Fermat-Torricelli 
problem  literature  [51],  we  identify  some  special  cases  of  solution  points  as  follows. 

Definition  2  (Special  cases  for  solution).  If  p|  =  Vj  for  some  j,  we  will  call  p|  worker 
absorbed.  If  pf  =  po,  we  call  p^  tanker  absorbed.  Otherwise ,  we  call  p^  floating. 

If  the  meeting  point  for  worker  i  is  worker  absorbed,  worker  i  should  either  remain  still 
and  wait  for  the  tanker  to  come  to  it  (z  =  j?),  °r  it  should  move  to  the  location  of  worker 
j  (i  j).  If  a  meeting  point  is  tanker  absorbed  for  worker  z,  the  tanker  should  not  move, 
and  the  worker  should  drive  to  the  original  tanker  location.  Floating  meeting  points  do  not 
coincide  with  any  original  robot  location.  Finally,  there  may  exist  hybrid  solutions  which 
combine  absorbed  and  floating  meeting  points. 

In  [57]  we  show  that  the  problem  of  finding  the  most  efficient  order  in  which  to  meet 
the  workers  is  NP-hard,  but  for  a  given  ordering  we  give  a  local  and  perfectly  scalable 
method  to  find  energy-efficient  robot  paths,  where  the  quality  of  the  route  is  very  similar 
to  those  found  by  a  global  numerical  approximation  method.  The  local  method  is  shown 
to  converge  in  time  proportional  to  the  distance  between  the  farthest  robots,  and  to  take 
constant  computation  time  and  memory  per  robot  per  update  cycle.  Clearly  the  Frugal 
Feeding  problem  is  a  sub  problem  of  some  possible  solutions  to  ASR,  and  we  hope  our  FF 
approach  may  be  usefully  adapted  to  a  wider  class  of  ASR  sub-problems. 
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4.2.3  Ant- inspired  Resource  Foraging 

In  [91]  Vaughan  described  a  simple  method  for  a  team  of  robots  to  cooperatively  explore 
an  environment  to  find  a  source  and  sink  of  resources,  then  navigate  between  them,  opti¬ 
mizing  the  route  to  avoid  spatial  interference  between  robots.  The  method  was  inspired  by 
pheromone  trail  following  in  ants,  and  tolerates  dynamic  obstacles,  robot  breakdowns,  etc. 

This  work  was  extended  in  [89,  88]  to  remove  the  earlier  requirement  of  global  localiza¬ 
tion:  the  modified  method  copes  with  unbounded  drift  in  localization  estimates.  Provided 
the  localization  error  growth  rate  is  low  enough  that  a  robot  can  complete  a  single  round- 
trip  reliably,  then  robots  can  usefully  maintain  a  joint  ‘ trail7  data  structure  that  functions 
as  a  dynamic,  minimal  map. 

4.2.4  Recharging 

The  standard  approach  to  robot  recharging  is  to  ignore  it  completely.  A  tiny  fraction  of  the 
mobile  robot  literature  considers  extended  operation  through  recharging  at  all.  Of  those 
that  do  have  recharging  robots,  the  most  common  approach  to  deciding  when  to  charge  is 
to  use  a  simple  threshold,  whereby  reaching  some  lower  bound  on  stored  energy  causes  the 
robot  to  switch  to  a  recharging  mode. 

Very  early  robots  included  theshold- based  recharging  [92].  A  recent  version  includes  [75, 
76]  which  describes  a  probe- like  mechanical  docking  mechanism,  and  a  simple  explore- then- 
recharge  controller  that  was  intended  only  to  demonstrate  the  reliability  of  the  mechanism 
(which  successfully  docked  in  97  out  of  100  autonomous  trials).  The  probe  mechanism  is 
shown  in  Figure  4.1. 

The  iRobot  Roomba,  Create  and  related  commercial  robots  have  autonomous  recharging 
capability.  The  method  used  to  select  recharging  is  likely  to  be  based  on  the  system  described 
in  United  States  Patent  7332890  (issued  19  February  2008),  which  is  a  simple  threshold 
approach. 

The  Roomba  and  Create  have  simple  contacts  under  the  front  of  the  robot,  and  drive 
into  the  dock,  making  contact  with  rounded  spring-loaded  electrode.  A  low  voltage  is  used 
for  safety.  In  our  experience,  the  dock  is  quite  reliable,  and  it  is  certainly  very  low-cost.  We 
chose  the  iRobot  dock  for  our  robot  system,  which  will  be  used  for  this  project.  Figure  4.2 
shows  the  iRobot  charger  and  a  Create  robot. 

The  simplicity  of  the  threshold  approach  is  appealing,  but  it  may  not  be  the  optimal 
strategy.  For  example  our  work  in  [93]  showed  that,  in  a  realistic  surveying  task,  an  adaptive 
threshold  produces  a  higher  overall  work  rate  than  a  static  threshold,  and  a  rate  maximizing 
approach  outperforms  both  by  a  large  margin.  The  biologically-inspired  rate  maximizing 
method  performs  well,  but  it  some  important  limitations  that  are  discussed  in  Chapter  7. 

In  [93]  Wawerla  and  Vaughan  analyze  the  problem  of  deciding  when  and  how  to  recharge 
a  single  robot  during  a  long- duration  survey  task.  We  introduced  a  simple  heuristic  method 
based  on  a  model  of  foraging  behaviour  from  the  behavioural  ecology  literature,  which  is 
shown  empirically  to  perform  within  1  or  2%  of  the  optimal  solution  discovered  by  brute- force 
search.  The  heuristic  is  fast  and  perfectly  scalable,  using  a  greedy  strategy  of  maximizing 
the  rate  at  which  energy  is  gathered  over  time.  Intuitively,  maximizing  the  rate  at  which 
energy  is  input  into  the  robot  means  we  can  maximize  the  rate  at  which  energy  is  output 
in  useful  work. 
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Figure  4,1:  Recharging  dock  mechanism  from  [76],  showing  schematics  of  male  and  female 
components,  and  implementation  fitted  to  a  Pioneer  robot.  Images  copyright  IEEE, 


Figure  4,2:  Recharging  dock  for  the  iRobot  Create  robot.  The  Create  has  simple  contacts 
under  the  Font  of  the  robot,  and  drives  into  the  dock,  making  contact  with  rounded  spring- 
loaded  electrode.  Images  copyright  iRobot  Corp,  taken  bom  their  web  site. 
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4.2.5  Interference  Reduction 

A  key  problem  for  multi- robot  systems  working  in  constrained  environments,  or  when  robot 
spatial  density  is  high,  is  spatial  interference  that  degrades  performance.  The  standard 
approach  is  planning:  robot  trajectories  are  selected  such  that  robots  do  not  interfere  with 
each  other.  Alternative  approaches  have  been  suggested  that  are  more  scalable,  mainly  based 
on  the  idea  of  territoriality,  so  that  robots  operate  in  distinct  work  zones,  and  avoid  each 
other.  This  is  often  considered  in  a  context  of  foraging  for  resources,  the  most  representative 
study  being  [26]. 

In  a  series  of  papers  and  theses  including  [90,  16,  103,  102]  we  have  explored  the  used 
of  local  robot- to- robot  signaling  strategies  to  break  symmetry  and  resolve  conflicts  over 
resources.  We  have  shown  several  refinements  of  an  “aggressive  display”  technique  whereby 
robots  indicate  their  “aggressiveness”  to  each  other,  and  the  more  aggressive  robot  gains 
access  to  the  right  of  way  in  a  narrow  corridor.  While  setting  aggression  at  random  is 
a  good  general  technique  for  breaking  symmetry,  we  have  invented  several  methods  that 
favour  the  globally  “best”  robot  with  high  probability,  while  using  only  local  information. 
Such  rational  aggression  is  shown  to  significantly  increase  the  throughput  of  the  population 
of  robots  performing  a  resupply  task  in  a  confined  space  like  an  office  building  or  narrow 
trail.  The  aggressive  interaction  method  is  applicable  to  any  conflict  over  resources,  where 
cheap  local  symmetry-breaking  is  required. 
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Chapter  5 


Classical  solution 


5.1  Abstract  description 

It  is  natural  to  functionally  decompose  the  problem  into  two  interacting  abstraction  levels. 
At  the  high  level  we  abstract  from  the  details  of  between- exchange  travels.  At  this  level 
we  view  the  problem  as  if  agent  moving  between  two  exchange  locations  is  a  deterministic 
action  of  known  duration  and  required  resource  expenditure.  At  the  high  level  the  problem 
becomes  a  problem  of  multi- agent  temporal  planning  with  continuous  time.  A  plan  for  agent 
i  is  a  sequence  of  exchanges  (e^,  e^,  *  .  * ,  e^J.  The  sequence  can  be  empty,  thus  leaving  some 
of  the  agents  idle.  During  the  plan  execution  conditions  might  change  (new  agents  appear 
or  become  damaged,  agents  are  removed  from  the  system,  travel  time  estimate  turns  out  to 
be  wrong,  etc.).  Therefore,  the  system  should  perform  execution  monitoring  and  replanning 
thus  achieving  continuous  planning  during  the  system  life.  Hence,  solving  the  high  level 
problem  gives  each  agent  the  high-level  task  of  moving  from  current  location  to  the  exchange 
location  and  performing  an  exchange.  This  task  presents  the  low  abstraction  level  of  the 
problem.  It  consists  of  point-to-point  path  planning,  safe  navigation  with  dynamic  obstacle 
avoidance,  global  localization  and  speed  control  and  potentially  some  automatic  exchange 
support.  The  general  architecture  of  the  system  is  shown  at  Fig  5.1. 


5.2  Implementation 

In  this  section  we  informally  outline  possible  implementations  of  different  modules  in  a 
classical  solution.  We  attempt  to  create  a  system  minimizing  the  set  of  modules  that  should 
be  custom  made.  Using  standard  off-the-shelf  solutions  will  make  the  system  support  and 
upgrade  cheaper  and  more  available. 

Planning  task  generator  The  planning  task  generator  performs  two  activities.  First, 
based  on  the  description  of  the  current  agents  state  and  their  properties  as  well  as  the 
resource  values  supplied  exogenously  it  devises  the  set  of  delivery  orders.  These  orders  are 
the  changes  in  resource  loads  of  a  subset  of  agents  that,  if  performed,  will  improve  the 
performance  metric  of  the  system.  Particular  mechanism  that  decide  which  set  of  orders 
should  be  served  depends  on  the  class  of  possible  resource  dynamics  functions  fij  and  the 
representation  of  these  functions.  These  system  design  decisions  influence  the  nature  of 
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Resource  values 


Figure  5.1:  System  architecture.  Resource  values  are  provided  exogenously.  Planning 
task  generator  submits  a  planning  task  to  the  planner,  which  outputs  a  plan  Ei  for  every 
agent.  Execution  monitor  receives  status  updates  from  the  agents  which  include  information 
about  performed  exchanges,  new  agent  registration  and  malfunction  notices.  Monitor  makes 
the  decisions  about  needed  replanning.  If  replanning  is  needed,  agents  report  their  current 
positions  r*(t),  loads  gi  (t) ,  resource  dynamics  functions  fij  and  exchange  speeds  Xtkj  to 
planning  task  generator  and  the  cycle  repeats. 
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the  optimization  task  to  be  solved  and  algorithms  that  will  be  used  for  this  optimization. 
Optimization  is  performed  by  the  order  generator  subsystem.  The  set  of  resulting  orders 
is  passed  to  the  task  encoding  subsystem.  There  is  an  override  mechanism  that  allows 
to  supply  the  set  of  orders  to  be  served  directly  to  the  task  encoding  subsystem.  The 
task  encoding  subsystem  is  responsible  for  encoding  the  set  of  orders  as  a  planning  task 
in  some  formal  planning  description  language  capable  of  describing  multi- agent  temporal 
domains.  PDDL  2.1  described  in  [27]  is  a  good  candidate.  Most  of  the  planning  languages 
separate  domain  description  and  problem  description,  thus  only  problem  description  should 
be  regenerated  every  time  a  new  set  of  orders  arrive.  Problem  description  has  point-to-point 
travels  and  exchanges  as  possible  actions  and  serving  all  orders  as  a  goal.  The  generator 
calculates  the  travel  duration  to  static  agents  or  approximates  the  rendezvous  time  for  pairs 
of  mobile  agents.  This  information  is  used  to  provide  action  durations  to  the  planning 
problem  description.  Depending  on  the  planner  resources  may  need  to  be  discretized  to 
make  a  set  of  possible  exchanges  finite. 

Planner  The  planner  receives  a  problem  description  in  a  formal  planning  language  and 
finds  a  good  plan  that  satisfies  described  goal.  We  intend  to  use  the  freely  available  central¬ 
ized  planner  described  in  [47]  that  solves  problems  defined  in  PDDL  2.1  language.  Later 
we  plan  to  consider  possible  distributed  planning  solutions.  Once  the  planner  finds  a  good 
plan,  exchange  sequences  are  communicated  to  involved  agents. 

Execution  monitor  The  execution  monitor  receives  the  plan  from  the  planner  and  status 
update  messages  from  agents.  The  monitor  compares  the  actual  start  and  end  time  of  the 
exchanges  with  the  planned  time.  If  a  deviation  is  detected,  the  monitor  estimates  the 
impact  on  the  performance.  If  the  deviation  is  significant  to  cause  an  impact  which  exceeds 
a  certain  threshold,  a  replanning  command  is  sent  to  all  agents  and  the  planner. 

Localization  We  assume  high-quality  global  localization  is  available.  This  is  reasonable 
in  the  state  of  the  art,  and  can  provided  outdoors  with  GPS  and  indoors  with  Monte-Carlo 
localization  methods,  as  referenced  in  the  literature  review. 

Navigation  Point  to  point  navigation  can  be  provided  by  various  standard  means.  For 
basic  experiments  we  can  use  the  wavefront  navigation  planner  provided  with  the  Player 
distribution,  though  if  we  need  to  examine  terrain  with  various  motion  costs  we  will  need 
to  add  additional  functionality  to  Player’s  implemntation. 

Obstacle  avoidance  will  use  Player’s  Vector  Field  Histogram  [85]  or  Nearness  Diagram 
[61]  modules.  We  do  not  intend  to  focus  on  this  much- studied  area. 
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Chapter  6 

Distributed  Gradient 
Optimization  with  Embodied 
Appr  oximat  ion 


Authors:  Yaroslav  Litus  and  Richard  Vaughan 
6.1  Abstract 

We  present  an  informal  description  of  a  general  approach  for  developing  decentralized  dis¬ 
tributed  gradient  descent  optimization  algorithms  for  teams  of  embodied  agents  that  need 
to  rearrange  their  configuration  over  space  and/or  time,  into  some  optimal  and  initially 
unknown  configuration.  Our  approach  relies  on  using  embodiment  and  spatial  embedded¬ 
ness  as  a  surrogate  for  computational  resources,  permitting  the  reduction  or  elimination 
of  communication  or  shared  memory  for  conventional  parallel  computation.  Intermediate 
stages  of  the  gradient  descent  process  are  manifested  by  the  locations  of  the  robots,  instead 
of  being  represented  symbolically.  Each  point  in  the  space- time  evolution  of  the  system  can 
be  considered  an  approximation  of  the  solution,  which  is  refined  by  the  agents7  motion  in 
response  to  sensor  measurements.  For  each  agent,  motion  is  approximately  in  the  direction 
of  the  local  antigradient  of  the  global  cost  function.  We  illustrate  this  approach  by  giving 
solutions  to  two  non-trivial  realistic  optimization  tasks  from  the  robotics  domain. 

We  suggest  that  embodied  approximations  can  be  used  by  living  distributed  systems  to 
find  affordable  solutions  to  the  optimization  tasks  they  face. 


6.2  Introduction 

We  know  by  now  that  The  World  Is  It’s  Own  Best  Model  ([15]).  Brooks7  recommendation 
to  avoid  internalizing  the  world  -  a  costly  and  error-prone  process  -  but  rather  to  sense  and 
react  to  it  directly  whenever  possible,  has  become  part  of  the  robotics  canon.  This  paper 
makes  explicit  an  inversion  of  this  approach,  in  which  where  the  world  is  used  to  externalize 
a  computational  process. 
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In  particular,  we  present  an  informal  description  of  a  general  approach  of  developing 
decentralized  distributed  gradient  descent  optimization  algorithms  for  teams  of  embodied 
agents  that  need  to  rearrange  their  configuration  over  space  and  time  into  some  optimal 
and  initially  unknown  configuration.  This  class  of  problems  includes  many  classical  mobile 
agent  problems  such  as  formation  control,  facility  location,  rendezvous,  navigation  and  in¬ 
terference  reduction.  Given  the  intractability  of  finding  optimal  solutions  to  these  large, 
high- dimensional  joint  state- space  planning  problems,  gradient  descent  methods  are  com¬ 
monly  used  to  find  approximate  solutions. 

The  proposed  approach  is  to  use  embodiment  and  spatial  embeddedness  as  a  surrogate 
for  computational  resources,  permitting  the  reduction  or  elimination  of  communication  or 
shared  memory  for  conventional  parallel  computation.  Intermediate  stages  of  the  gradient 
descent  process  are  manifested  by  the  locations  of  the  robots,  instead  of  being  represented 
symbolically.  Each  point  in  the  space- time  evolution  of  the  system  can  be  considered  an 
approximation  of  the  solution,  which  is  refined  by  the  agents7  motion  in  response  to  sen¬ 
sor  measurements.  For  each  agent,  motion  is  approximately  in  the  direction  of  the  local 
antigradient  of  the  global  cost  function. 

Gradient- based  formation  control  and  navigation  have  been  widely  used  in  robotics 
[101,  82].  The  idea  of  embodied  computation  was  recently  presented  in  [38],  however  con¬ 
trary  to  the  authors7  claim  we  believe  that  globally  defined  algorithm  can  be  implemented 
by  embodied  computation.  Finally,  [58]  have  shown  biologically  plausible  navigation  and 
tracking  algorithms  which  involve  using  other  agents  location  to  calculate  local  gradient. 
To  our  knowledge,  this  paper  is  the  first  to  explicitly  present  embodied  approximation  as  a 
method  to  implement  parallel  embodied  gradient  optimization  algorithms. 

We  illustrate  this  approach  by  giving  solutions  to  two  non-trivial  realistic  optimization 
tasks  from  the  robotics  domain.  This  work  is  in  the  context  of  our  interest  in  large-scale 
distributed  systems  such  as  animal  colonies  and  multi- robot  systems,  which  work  together 
to  solve  complex  tasks.  We  are  interested  in  identifying  mechanisms  that  exploit  the  charac¬ 
teristics  of  the  embodied  multi- agent  domain  to  solve  complex  computational  problems.  We 
are  particularly  interested  in  solving  practical  resource  allocation  problems  in  multi- robot 
systems,  with  energy  autonomy  as  the  key  motivating  problem.  This  is  the  motivation  for 
our  choice  of  example  problems:  two  different  versions  of  energy-efficient  robot- robot  ren¬ 
dezvous,  useful  for  for  recharging  or  refueling,  or  as  a  component  of  various  other  tasks. 
While  looking  for  ways  to  find  meeting  places  which  minimize  the  traveling  costs  for  groups 
of  robots  we  developed  fully  decentralized  heuristic  methods  which  also  require  no  range 
information  to  converge  at  good  approximation  to  the  optimal  solution.  These  heuristics 
afford  a  very  simple  implementation. 

The  key  insight  that  underlies  our  approach  is  that  the  spatial  configurations  of  the 
agents  themselves  could  be  considered  as  an  approximate  solution  to  the  entire  problem. 
An  individual  agent  can  move  itself,  thus  refining  the  component  of  current  solution  ap¬ 
proximation.  No  representation  of  the  problem,  or  the  current  solution,  needs  to  be  held  by 
any  robot:  they  manifest  the  solution  by  their  physical  configuration.  This  is  an  example 
of  what  Payton  has  called  “wo rid- embedded  computation77  [64]  and  exploits  the  property  of 
“strong  embodiment77  identified  by  [15],  extended  to  a  multi-agent  system. 
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6.3  Distributed  gradient  optimization  with  embodied 
approximation 

Assume  a  system  of  n  spatially  embedded  agents  is  described  by  a  spatial  configuration 
s  e  H  where  H  =  H±xH2  x  ...  x  Hn  and  Hi  is  a  vector  space  of  possible  configurations 
of  agent  z.  We  will  denote  the  z-th  component  of  s  as  si*  Every  agent  has  control  over 
the  hr st- order  dynamics  of  its  own  configuration  which  allows  it  to  continuously  change  its 
configuration  possibly  with  some  restrictions  on  the  speed  of  this  change.  We  also  assume 
that  every  agent  z  can  sense  the  spatial  configuration  of  a  set  of  other  agents  z/^.  This  set  can 
change  as  the  system  evolves.  Given  some  cost  function  J  :  H  — >  R  we  want  to  rearrange 
the  system  into  an  optimal  configuration  s*  =  argmin sJ(s).  Note  that  the  cost  function 
may  be  parametrized  by  initial  agents  configuration  so* 

If  the  function  J  has  certain  mathematical  properties  (e.g.  is  continuously  differentiable) 
then  a  good  approximation  to  optimal  configuration  can  be  found  in  a  centralized  way 
by  using  one  of  many  gradient  optimization  algorithms.  It  will  start  with  some  initial 
approximation  and  iteratively  change  it  in  constant  or  decreasing  steps  in  the  direction  of 
(approximated)  antigradient.  Details  of  the  particular  algorithms  are  irrelevant  for  present 
discussion.  This  centralized  solution  will  need  a  central  processor  with  amount  of  memory 
of  the  order  of  the  size  of  the  state  vector  s  as  well  as  means  to  communicate  5*  to  all  agents 
once  an  acceptable  approximation  is  found.  Once  all  agents  know  their  component  of  5* 
they  can  proceed  directly  towards  it.  If  J  is  parametrized  by  initial  configuration  of  agents 
So  then  the  means  of  communicating  or  sensing  this  configuration  by  the  central  processor 
are  required. 

Since  the  system  of  n  agents  in  question  has  n  processors  that  can  work  in  parallel  it  is 
natural  to  try  to  use  this  resource  to  get  a  parallel  solution.  Distributed  implementation  of 
iterative  algorithms  are  well- studied  [10,  11,  50].  It  is  known  that  if  processors  synchronously 
compute  and  communicate  their  partial  results  to  other  processors  then  resulting  distributed 
procedure  is  equivalent  to  a  single- processor  implementation  thus  preserving  its  convergence 
properties.  Even  more,  under  certain  realistic  conditions  (like  small  iteration  steps  and 
bounded  communication  delays)  the  synchrony  assumption  can  be  relaxed  and  processors 
can  be  allowed  to  compute  at  different  speeds  and  communicate  sporadically  while  still 
giving  the  same  convergence  properties  as  original  serial  algorithm  [84].  Therefore,  we 
can  assign  every  agent  z  the  task  of  iterative  optimization  of  its  own  component  si  of  the 
global  state  vector  5  by  changing  si  in  the  direction  of  the  corresponding  component  of  the 
antigradient  vector  VJ^  and  periodically  communicate  current  value  of  Si  to  other  agents  as 
well  as  receive  updates  of  the  current  approximations  of  Sj,j  ^  z  from  other  agents.  Note, 
that  the  amount  of  information  from  other  processors  needed  to  compute  VJ^  depends 
on  the  particular  problem.  If  VJ^  depends  only  on  si  then  no  additional  information  is 
necessary  and  si  can  be  computed  independently  of  other  processors.  If  VJ^  depends  on 
some  other  components  of  global  state  s  then  the  current  values  of  these  components  should 
be  received  from  the  processors  which  compute  them.  We  denote  the  set  of  agents  that 
compute  components  necessary  to  calculate  VJ^  by  If  the  assumptions  of  [84]  are  met, 
then  eventually  every  agent  will  know  the  approximation  s\  and  will  be  able  to  proceed 
towards  it. 

However,  a  physical  multi- agent  system  is  much  more  than  simply  n  parallel  processors. 
Physical  embodiment  implies  spatial  embeddedness,  and  for  some  problems  these  remarkable 
dual  properties  allow  us  to  drastically  reduce  or  totally  eliminate  the  need  to  communicate 
intermediate  results  of  calculations,  or  indeed  any  decryption  of  the  problems  or  solutions, 
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substituting  instead  direct  physical  observations.  In  addition,  instead  of  waiting  for  an  iter¬ 
ated  algorithm  to  converge  agents  can  perform  reconfiguration  during  the  optimization  thus 
giving  the  resulting  algorithm  an  attractive  anytime  property.  The  high-level  description  of 
the  approach  is  given  in  Algorithm  1. 


Algorithm  1  Gradient  optimization  with  embodied  approximation 
1:  for  all  agents  i  do 

2:  sense  current  configurations  Sj  for  j  G  ^  H  £$ 

3:  update  using  communication  current  configurations  for  j  GE  ({1,2, ...,z  —  l,z  + 

1,...  ,n}\^)  nfi 
4:  Di  <  ^ Ji Sj \j ) 

5:  if  VJ^  exists  and  component  can  be  improved  then 

6:  move  in  direction  —Di 

7:  else 

8:  stay  still 

9:  end  if 

10:  communicate  new  configuration  to  other  agents 

ll:  end  for 


The  key  idea  is  to  use  agent  configurations  directly  as  approximations  to  correspond¬ 
ing  components  of  global  goal  configuration.  Thus  all  agents  move  in  the  (approximated) 
antigradient  direction  which  is  calculated  at  Step  4  using  their  current  configuration  as  the 
current  approximation  to  the  solution.  The  communication  at  Step  3  is  necessary  only  if 
not  all  of  the  information  from  the  relevant  agents  belonging  set  £$  can  be  acquired  by  direct 
sensing  at  Step  2.  In  the  best  case  no  communication  is  required  as  all  necessary  information 
is  available  via  sensing,  so  ^  D  If  some  components  nevertheless  have  to  be  acquired 
by  communication,  this  could  be  done  in  a  asynchronous  sporadic  manner  as  was  argued 
above. 

This  approach  makes  agents  themselves  serve  as  a  shared  “memory”  for  the  parallel  com¬ 
putation  they  perform  where  the  information  about  current  approximation  is  embodied  in 
physical  configuration  of  the  agents.  Once  the  values  of  necessary  components  are  available, 
gradient  direction  VJ^  or  an  approximation  to  it  can  be  calculated  and  agent  can  move  in 
antigradient  direction  thus  improving  approximation  of  component  Si  and  global  state  s.  In 
a  certain  sense,  agent  relocation  becomes  a  part  of  computation  process  creating  a  parallel 
computer  comprised  by  agent  processors  and  physical  bodies  of  agents.  The  algorithm  con¬ 
tinues  until  convergence  is  reached  which  is  detected  in  a  way  specific  for  particular  problem 
and  algorithm  employed. 

6.4  Applications 

We  illustrate  the  idea  of  optimization  with  embodied  approximation  by  two  problems  of 
energy-efficient  multi-robot  coordination  which  we  studied  in  our  previous  work.  The  first 
problem  is  to  assemble  a  team  of  robots  at  an  initially  unknown  point  which  minimizes  the 
total  energy  spent  on  relocation  [99].  The  second,  more  difficult  problem  is  an  instance 
of  the  multi- facility  location  problem  which  asks  to  plan  a  route  for  a  team  of  robots  to 
rendezvous  a  single  dedicated  service  robot,  possibly  each  at  a  different  point  [56,  57]. 
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6.4.1  Energy-efficient  single-point  rendezvous 

Assume  n  robots  are  located  at  positions  i  =  1  . . .  n.  When  a  robot  moves,  it  expends 
energy  proportional  to  the  length  of  its  trajectory  Robots  have  individual  energy  costs  c$ 
per  unit  of  traveled  distance,  thus  if  robot  i  moves  from  a  to  6,  it  spends  c$||a  —  b\\  units 
of  energy  Now  the  task  is  to  find  a  point  p*  which  minimizes  the  total  energy  spent  by  all 
robots  for  meeting  at  that  point. 

Definition  3  (Energy-efficient  rendezvous  problem).  Given  robot  locations  e  Rd,i  = 
1, . . .  ,  n  find  rendezvous  point 

n 

p*  =  argvdmJ(p)  =  y^Ci\\p- ri\\  (6.1) 

T>  - * 

2=1 

Historically,  this  problem  has  a  variety  of  names  including  the  Fermat-Steiner  problem, 
Weber  problem,  single  facility  location  problem,  and  the  generalized  Fermat-Torricelli  prob¬ 
lem.  Though  it  is  not  possible  to  find  a  closed-form  solution  for  this  problem,  the  properties 
of  its  solutions  are  well  known  (see  [36]  for  the  case  n  =  3  and  [51]  for  the  general  case). 
Effective  numerical  algorithms  exist  [67].  Interestingly,  it  is  also  possible  to  describe  the  so¬ 
lution  using  a  mechanical  interpretation  as  a  system  of  idealized  strings,  pulleys  and  weights 
[66]. 

The  antigradient  of  the  cost  function  is 

n  b  —  a 

-VJ(x)  =  V  u(x,  ri),u(a,  b )  =  -n - ^  (6.2) 

^  Wa-bW 

That  is,  the  antigradient  at  some  point  is  simply  a  sum  of  unit  vectors  pointing  in  the 
directions  of  original  robot  locations.  Note  that  antigradient  is  not  defined  at  the  locations 
of  the  robots  themselves  while  one  of  these  locations  can  be  a  solution.  If  points  ri  are  not 
collinear  (not  lying  upon  a  straight  line),  the  goal  function  in  (6.1)  is  strictly  convex,  which 
ensures  the  uniqueness  of  p*  if  n  >  3.  In  this  case  solution  is  characterized  by  the  following 
theorem  [51]. 

Theorem  2.  If  r$  are  not  collinear  and  for  each  point  r$ 


n 


(6.3) 


then  p *  Vi  for  any  i  and  VJ(p*)  =  0  (the  floating  case).  If  (6.3)  does  not  hold  for  some 

Vi  then  p*  =  Vi  (the  absorbed  case). 

A  simple  embodied  approximation  method  gives  a  useful  solution  to  this  problem.  Let  all 
robots  compute  the  solution  point  approximation  simultaneously  by  checking  the  absorbed 
case  condition  in  Theorem  2  and  moving  along  antigradient  direction  if  the  condition  is 
not  met.  This  produces  the  distributed  Algorithm  2  which  runs  until  robots  converge  to  a 
single  point.  Note  that  since  robots  are  embodied  they  can  not  occupy  the  same  point,  thus 
we  consider  that  two  robots  met  whenever  the  distance  between  them  is  closer  than  some 
meeting  range  e. 

In  order  to  calculate  the  antigradient  at  its  current  location,  each  robot  needs  to  know 
the  direction  towards  the  original  robot  locations  This  could  be  achieved  either  by  means 
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Algorithm  2  Static  algorithm  for  energy-efficient  rendezvous 
1:  for  all  robots  i  do 
2:  Update  current  location  of  self,  xi 

3:  Di  <-  E ciu(x >  w(a>  b )  =  JlffffJ 

4:  if  x  =  Ti  for  some  i  then 

5:  c  c$ 

6:  else 

7:  C^O 

8:  end  if 

9:  if  || All  <  c  then 

10:  stop 

ll:  else 

12:  move  in  direction  Di 

13:  end  if 

14:  end  for 


of  global  localization  and  memorizing  ri  or  by  setting  static  beacons  at  points  ri  (hence  the 
name  of  the  algorithm).  Both  localization  and  static  beacons  are  expensive  solutions,  but 
embodied  approximation  can  be  used  once  again  to  get  rid  of  the  necessity  to  calculate 
directions  to 

Once  some  of  the  robots  move  from  their  original  locations  ri  a  new  instance  of  the 
rendezvous  problem  is  created.  In  this  new  instance  all  robots  are  located  at  the  new 
“original”  locations  ri  thus  sensing  the  instantaneous  directions  to  the  robots  is  enough  to 
calculate  the  antigradient  value.  Hence  memorizing  ri  or  using  static  beacons  is  replaced 
by  using  the  robots  as  dynamic  beacons  resulting  in  the  distributed  Algorithm  3. 


Algorithm  3  Dynamic  algorithm  for  energy-efficient  rendezvous 
1:  for  all  robots  j  do 

2:  Aj  {z|  |  \ri  —  Tj\  |  <  e},  meaning  the  set  of  robots  which  are  closer  to  rj  than  meeting 

threshold  e.  Note  that  j  G  1,  thus  A  has  at  least  one  element. 

3:  Dj  < 

4:  C<-  AgA 

5:  if  || All  <  c  then 

6:  stop 

7:  else 

8:  move  in  direction  Di 

9:  end  if 

10:  end  for 


A  set  of  experiments  was  conducted  to  compare  the  performance  of  these  algorithms  with 
the  performance  of  a  centralized  optimization  algorithm  (see  [99]  for  details).  Some  typical 
results  are  shown  in  Table  6.1.  The  centralized  static  algorithm  calculates  the  optimal 
meeting  point  p*  exactly  once  and  commands  all  robots  to  proceed  directly  to  that  point. 
The  centralized  dynamic  algorithm  periodically  recomputes  the  meeting  point  incorporating 
new  positions  of  the  robots  while  they  move  towards  their  rendezvous.  It  could  be  seen  that 
distributed  methods  achieve  the  quality  of  solution  comparable  to  centralized  methods. 
The  difference  is  explained  by  the  fact  that  the  trajectory  along  the  antigradient  is  not 
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(a)  Centralised  static 


(b)  Local  static 


(c)  Centralised  dynamic 


(d)  Local  dynamic 


Figure  6,1:  Typical  paths  taken  to  single  point  rendezvous  (Map  2) 


Table  6,1:  Mean  Total  Energy  Used  For  Rene 


ezvous.  All  Standard  Deviations  <  5% 


Dynamic 

Static 

Map 

Centralised 

Distributed 

Centralised 

Distributed 

1 

149.99 

151.56 

149.92 

152. T9 

2 

105.21 

115.21 

105.25 

119.00 

3 

rr.se 

80.49 

TT.28 

81.48 

4 

4TT.89 

503.18 

4T6.9T 

530.11 

straight  and  thus  longer  then  the  direct  path  to  the  optimal  meeting  locations  (see  Fig,  6,1), 
However,  in  applications  that  require  scalability  distributed  algorithms  with  low  demands 
on  sensing  and  communication  will  be  preferable  even  if  they  produce  slightly  worse  results 
than  centralized  solutions, 

6*4*2  Ordered  Frugal  Feeding  Problem 

Consider  a  team  of  worker  robots  that  can  recharge  by  docking  with  a  dedicated  refueling 
(or  equivalently,  recharging)  robot  called  a  tanker. ,  as  described  in  [100],  The  tanker  robot 
could  remain  at  a  fixed  location,  acting  as  a  conventional  charging  station,  or  it  could  move 
to  rendezvous  with  worker  robots.  Simultaneously,  worker  robots  can  wait  for  the  tanker  to 
come  to  them,  or  they  can  move  to  meet  the  tanker. 

By  analogy  to  a  mother  animal  attending  her  offspring  we  call  this  problem  the  “Frugal 
Feeding  Problem”,  If  we  impose  a  total  order  in  which  worker  robots  must  be  met  and 
charged,  (perhaps  based  on  urgency  or  some  other  priority  scheme)  we  obtain  the  “Ordered 
Frugal  Feeding  Problem”  considered  here.  As  in  a  single  point  rendezvous  problem  we  model 
locomotion  costs  as  the  weighted  Euclidean  distance  between  the  origin  and  destination. 
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The  Ordered  Frugal  Feeding  Problem  can  be  stated  formally  as  follows: 

Definition  4  (Ordered  Frugal  Feeding  Problem).  Given  tanker  location  p$  e  Hd,  worker 
locations  ri  e  Rd,  i  =  1, . .  . ,  k  find 

min  J(p up2, . . .  ,pjfc)  = 

Pl,P2,-~,Pk 

k 

yj  (^o||Pi  -Pi-i||  +  Wi\\n  -  Pill);  (6.4) 

2=1 

Here  vJo\\Pi  —  Pi-i\  \  gives  the  cost  of  tanker  relocation  between  points  pi-\  andpi ,  vji\\ri  —  pi\\ 
gives  the  cost  of  moving  worker  i  from  its  original  position  to  meeting  point  pi . 

Debnition  (4)  could  be  amended  to  require  the  tanker  to  return  to  its  original  location 
after  attending  all  workers,  perhaps  to  refuel  itself,  without  affecting  presented  results. 

We  denote  solution  points  as  p|  The  possibility  of  several  robots  being  attended  in  one 
place  is  permitted,  and  captured  by  the  possible  coincidence  of  some  meeting  points.  We 
define  a  meeting  as  the  event  when  robots  come  within  distance  s  of  each  other. 

Unlike  the  single- point  rendezvous  case  this  problem  involves  several  relocations  of  the 
system  (tanker  needs  to  go  between  all  rendezvous  points  in  turn).  However,  the  embodied 
approximation  approach  is  applicable  and  results  in  a  distributed  algorithm  which  produces 
good  approximations  to  the  optimal  solutions. 

We  start  with  considering  the  antigradient  of  the  cost  function  J.  Using  Eq  (6.2)  we  can 
express  the  z-th  component  of  antigradient  as 


-VJj(pj)  =  w0w(pi,Pi_i)  +  w0u(pi,pi+ 1)  +  Wiu(pi,ri),  (6.5) 

for  i  =  l..fc  —  1.  The  antigradient  component  for  the  last  robot  k  obviously  includes  one 
less  term.  Now  let  every  worker  robot  i  represent  current  approximation  to  the  solution 
point  p|.  Let  the  tanker  also  represent  the  current  approximation  of  the  meeting  point  for 
the  first  robot  to  be  charged.  That  is,  point  p\  is  simultaneously  represented  by  tanker 
and  worker  1.  The  system  will  perform  parallel  gradient  descent  with  all  robots  moving 
along  the  corresponding  approximation  to  the  antigradient  component  given  by  Eq  (6.5) 
calculated  at  their  own  position.  In  other  words,  robots  0  (the  tanker)  and  1  (the  next 
robot  to  be  charged)  move  along  the  approximated  antigradient  of  the  part  of  cost  function 
g(Pi)  =  wo  |  |jPo  —  Pi||  +^o|  | Pi  —  P2I  |  +u?i|  | pi  —  ri||  using  the  current  position  of  robot  2  as  an 
approximation  to  the  unknown  solution  point  £>2*  The  rest  of  the  robots  j,j  =  2, ...,  k  move 
along  the  approximated  antigradient  of  the  cost  functions  fj(pj)  =  u?o|  | Pj  —  Pj-i\  |  +^o||Pj  — 
jPj_l_i||  +  vJj\\rj  —  Pj ||  using  Tj-hVj+i  as  approximations  for  the  unknown  solution  points 
jPj_i,  Algorithm  4  describes  the  procedure  formally.  Convergence  if  this  algorithm  is 

analyzed  in  [56] . 

Parallel  computation  of  movement  directions  in  steps  11-14  and  simultaneous  movement 
of  all  robots  in  steps  17-20  provide  the  scalability  of  this  algorithm.  Importantly,  the  per- 
robot,  per-timestep  cost  is  constant,  so  the  method  works  for  arbitrary  population  sizes. 
Unlike  the  single- point  algorithms  described  in  previous  section,  here  every  robot  needs  to 
know  the  direction  to  only  two  other  robots  to  calculate  its  movement  direction. 

More  specifically,  this  algorithm  requires  every  worker  to  know  its  own  weight  and  the 
tanker  weight,  whether  or  not  it  is  the  head  of  the  current  charging  queue,  and  the  direction 
towards  the  next  worker  in  the  queue  and  the  previous  worker  (or  tanker  if  the  robot  is  at 
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Algorithm  4  Distributed  Algorithm  for  Ordered  Frugal  Feeding  Problem 
1:  i<-l 

2:  define  d(x,  y)  =  (y  —  x)/\  \x  —  y\\ 

3:  let  ro  be  the  current  tanker  position,  ri  be  the  position  of  worker  i 

4:  while  Kndo 

5:  if  ro  is  close  to  ri  then 

6:  tanker  charges  worker  i;  i  <—  i -j-  1 

7:  else 

8:  if  i  =  n  (only  one  robot  in  queue)  then 

9:  the  lighter  of  tanker  and  worker  goes  towards  the  other 

10:  else 

H;  ^n-\- 1  —  ? n 

12:  Bo  <—  yj1d(ro^r1)  +  vj0d(r0,  r2). 

13:  Di  wod{Ti,  To)  +  wod{Ti,  ri+1) 

14:  for  all  i  <  j  <  n  do 

15:  Dj  vJod(rj,Tj_i)  +  vjod(rj,Tj+1) 

16:  end  for 

17:  for  all  j  e  {0,  z,  i  +  1, . . . ,  n}  do 

18:  if  \\Dj\  \  <  Wj  then 

19:  robot  j  stops 

20:  else 

21:  robot  j  proceeds  in  the  direction  Dj 

22:  end  if 

23:  end  for 

24:  end  if 

25:  end  if 

26:  end  while 


the  head  of  the  queue).  The  tanker  needs  to  know  its  own  weight,  the  weights  of  the  first 
two  robots  in  the  queue  and  the  directions  towards  them.  As  a  robot  is  met  and  charged, 
it  is  removed  from  the  queue,  and  this  update  is  broadcast  to  all  robots. 

As  in  the  single  point  rendezvous  case  here  instead  of  operating  with  the  model  of  the 
world  and  searching  for  the  complete  solution,  each  robot  uses  the  position  of  itself,  its 
queue  predecessor  and  successor  as  the  current  embodied  approximation  to  the  solution 
points.  Every  robot  tries  to  improve  the  global  solution  quality  by  moving  in  the  direction 
which  decreases  the  part  of  the  total  cost  function  that  concerns  itself  and  its  neighbors.  If 
the  robot  finds  itself  located  at  the  minimizing  point  for  the  current  local  configuration  of 
robots,  the  robot  stops.  Fig  6.2  shows  some  trajectories  produced  by  the  algorithm. 

To  evaluate  the  performance  of  the  algorithm  we  performed  3000  simulations  running 
distributed  algorithm  for  Ordered  Frugal  Feeding  Problem  on  randomly  generated  instances 
of  the  problem,  with  initial  robot  locations  drawn  from  the  same  uniform  distribution, 
and  3  different  uniform  distributions  of  weights,  with  1000  experiments  for  each  weight 
distribution.  In  each  experiment,  10  randomly  weighted  worker  robots  and  a  tanker  were 
placed  at  random  locations  in  an  square  arena  with  20m  sides.  A  meeting  range  of  a 
0.1m  and  a  movement  step  length  of  0.01m  were  set.  The  path  traveled  by  each  robot 
was  recorded,  and  it’s  length  was  multiplied  by  the  robot’s  weight,  then  summed  for  the 
population  to  give  the  total  energy  spent  on  performing  the  rendezvous.  For  comparison 
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Figure  6.2:  Some  trajectories  produced  by  distributed  algorithm  for  Ordered  Frugal  Feeding 
Problem.  Triangle  represents  tanker,  small  disks  represent  workers,  large  disks  show  the 
meeting  ranges  at  the  final  point  of  each  worker  trajectory. 


we  computed  optimal  solutions  of  each  instance  using  discrete  search  [57]  on  a  regular  grid 
with  40  ticks  per  dimension  covering  the  embedding  hypercube  of  original  robot  locations. 
The  regularity  of  the  grid  allows  us  compute  a  lower  bound  of  the  optimal  cost  based  on  the 
result  of  discrete  search  by  subtracting  the  maximum  possible  error  due  to  discretization 
(no  such  quality  bounds  are  readily  available  for  numerical  approximation  methods  to  the 
continuous  problem,  and  no  closed  form  solution  is  known). 

We  calculated  the  upper  bound  of  approximation  factor  for  every  problem  instance  by 
dividing  distributed  algorithm  for  Ordered  Frugal  Feeding  Problem  results  by  the  lower 
bound  of  optimal  cost.  Table  6.2  reports  the  statistics  of  these  approximation  factor  bounds 
for  each  distribution. 


Table  6.2:  Statistics  of  approximation  factor  bounds 


Wi  =  1 

wi  ~  £7[1,3) 

Wi  ~  U[  1, 100] 

Mean 

1.19 

1.22 

1.31 

Median 

1.19 

1.20 

1.25 

St.  dev. 

0.03 

0.08 

0.23 

Skewness 

0.67 

1.18 

4.27 

Kurtosis 

0.65 

1.65 

31.82 

The  results  show  increasing  variation  of  the  approximation  factor  bounds  with  increas¬ 
ing  variation  in  robot  weights.  Indeed,  distributed  algorithm  for  Ordered  Frugal  Feeding 
Problem  can  cause  the  tanker  to  move  suboptimally  in  the  direction  of  two  light  workers 
and  then  return  to  the  third  heavy  worker.  Hence,  distributed  algorithm  for  Ordered  Fru- 
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gal  Feeding  Problem  does  not  have  a  constant  approximation  factor.  However,  the  average 
quality  of  solutions  for  uniformly  distributed  problem  appears  very  good.  Thus,  the  method 
could  be  used  where  the  guaranteed  quality  of  results  obtained  in  non-scalable  centralized 
manner  should  be  sacrificed  in  favor  of  simple  decentralized  scalable  solution  with  good 
average  performance. 


6.5  Conclusion 

A  general  approach  for  development  of  decentralized  distributed  gradient  descent  optimiza¬ 
tion  algorithms  for  teams  of  embodied  agents  is  presented.  This  approach  uses  embodiment 
and  spatial  embedding  as  valuable  computational  resources  which  allow  to  reduce  or  elim¬ 
inate  communication  or  shared  memory  requirements  for  parallel  computation.  Spatial 
configuration  of  agents  serves  as  an  embodied  representation  of  the  current  approximation 
to  global  solution  which  is  accessed  by  means  of  sensing  and  refined  by  agents  moving  along 
local  antigradient  directions.  Two  non- trivial  optimization  tasks  of  energy-efficient  path 
planning  for  mutli- robot  teams  were  used  to  illustrate  the  embodied  approximation  ap¬ 
proach.  Resulting  distributed  algorithm  show  very  good  results  in  experimental  evaluation. 
These  algorithms  use  very  simple  mechanisms  that  result  in  energy  efficient  group  behavior. 
In  both  cases,  computationally  complex  optimization  tasks  are  solved  using  biologically  af¬ 
fordable  machinery  of  bearing- only  sensing.  Thus,  it  seems  promising  to  look  for  examples 
of  optimization  by  means  of  embodied  approximation  in  groups  of  animals.  Such  animal 
strategies  could  and  should  be  used  to  increase  the  efficiency,  and  eventually  autonomy,  of 
robot  teams. 

Admittedly  the  extent  to  which  embodied  approximation  can  substitute  communication 
or  shared  memory  is  limited  by  the  properties  of  sensors  and  environment.  Limited  sensor 
range  and  occlusions  will  limit  the  number  of  state  vector  components  that  could  be  accessed 
by  sensing.  However,  some  optimization  problems  by  their  nature  need  only  local  and  limited 
exchange  of  information.  For  the  rest  of  the  problems  embodied  approximation  may  still 
significantly  reduce  the  need  for  inter- agent  communication  and  should  be  considered  as  one 
of  the  available  resources. 

Future  work  includes  application  of  embodied  approximation  approach  to  other  opti¬ 
mization  problems  emerging  in  multi- agent  teams.  A  search  for  biological  examples  of  par¬ 
allel  gradient  optimization  with  embodied  approximation  is  another  interesting  direction. 
Finally,  accurate  mathematical  models  of  spatially  embedded  multi- agent  systems  with  cer¬ 
tain  sensing  capabilities  could  be  developed  and  the  computation  power  of  such  systems  can 
be  theoretically  studied  taking  into  account  embodied  approximation  as  a  computational 
resource. 
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Chapter  7 


Optimal  Robot  Recharging 
Strategies  for  Time-Discounted 
Labour 

Authors:  Jens  Wawerla  and  Richard  Vaughan 
7.1  Abstract 

Energy  is  defined  as  the  potential  to  perform  work:  every  system  that  does  some  work  must 
possess  the  required  energy  in  advance.  An  interesting  class  of  systems,  including  animals 
and  recharging  robots,  has  to  actively  choose  when  to  obtain  energy  and  when  to  dissipate 
energy  in  work.  If  working  and  collecting  energy  are  mutually  exclusive,  as  is  common  in 
many  animal  and  robot  scenarios,  the  system  faces  an  essential  two-phase  action  selection 
problem:  (i)  how  much  energy  should  be  accumulated  before  starting  work;  (ii)  at  what 
remaining  energy  level  should  the  agent  switch  back  to  feeding/ recharging?  This  paper 
presents  an  abstract  general  model  of  a  energy- managing  agent  that  does  time- discounted 
work.  Analyzing  the  model,  we  find  solutions  to  both  questions  that  optimize  the  value  of 
the  work  done.  This  result  is  validated  empirically  by  simulated  robot  experiments  that 
agree  closely  with  the  model. 


7.2  Introduction 

Since  the  early  years  of  robotics,  e.g.  [92],  roboticists  have  been  challenged  by  the  need  to 
supply  robots  with  energy.  For  mobile  robots  that  carry  their  own  limited  energy  store, 
the  key  question  is  “when  should  the  robot  refuel/ recharge?” .  A  standard  approach  in 
the  literature  and  in  commercial  robots  is  to  set  a  fixed  threshold  and  refuel  when  ever  the 
robot’s  energy  supply  drops  below  this  threshold  ([75],  [76]).  The  simplicity  of  this  approach 
is  appealing,  but  it  may  not  be  the  optimal  strategy.  For  example  [93]  showed  that,  in  a 
realistic  surveying  task,  an  adaptive  threshold  produces  a  higher  overall  work  rate  than 
a  static  threshold,  and  a  rate  maximizing  approach  outperforms  both  by  a  large  margin. 
The  biologic  ally- inspired  rate  maximizing  method  performs  well,  but  it  some  important 
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limitations  discussed  below.  This  paper  provides  a  more  sophisticated  model  for  rational 
recharging  robots 

Robots,  by  their  name1  and  nature,  are  supposed  to  perform  some  form  of  labour, 
e.g.  space  exploration,  entertainment,  rescue  missions,  clean-up,  assembly,  etc.  Most  tasks 
require  execution  in  a  timely  manner,  though  some  more  then  others.  For  example,  we  may 
be  willing  to  wait  for  a  day  or  two  for  the  latest  geological  observations  from  Mars,  while 
waiting  the  same  time  for  the  rescue  of  trapped  miners  or  ordnance  disposal  might  not  be 
acceptable.  The  standard  method  to  model  the  descr easing  value  of  work  over  time,  and 
thereby  encourage  timely  execution,  is  to  discount  by  some  factor  (3  in  discrete  timesteps  the 
reward  given  in  exchange  for  investment  ([87]),  where  investment  here  is  energy  dissipated 
in  labour.  The  inverse  of  discount  (1//3)  is  the  familiar  interest  rate,  of  savings  accounts 
and  credit  cards. 

The  laws  of  physics  dictate  that  energy  cannot  be  transferred  instantaneously,  or  in  other 
words,  refuelling  takes  time  and  this  time  cannot  be  spent  working.  If  a  robot  spends  an  hour 
refuelling,  it  starts  to  work  one  hour  later  and  since  the  reward  is  discounted  over  time,  it 
receives  a  smaller  payment  then  if  it  would  have  started  working  immediately.  But  the  initial 
charging  period  is  strictly  required,  as  no  work  can  be  done  without  previously  obtaining 
energy.  This  conflict  between  the  mutually  exclusive  tasks  of  refuelling  and  working  raises 
two  interesting  questions: 

Q1  How  much  energy  should  be  accumulated  before  starting  work? 

Q2  At  what  remaining  energy  level  should  the  agent  switch  back  to  obtaining  energy? 

Most  real-world  robot  systems  avoid  these  questions  by  maintaining  a  permanent  con¬ 
nection  to  an  energy  source,  e.g.  industrial  robotic  manipulators  wired  into  the  mains  power 
grid,  or  solar  powered  robots  which  are  capable  of  gathering  energy  while  performing  some 
task  at  the  same  time.  This  paper  addresses  the  more  interesting  class  of  machine,  including 
animals  and  mobile  robots,  that  must  obtain  and  store  energy  prior  to  working.  The  Q1,Q2 
action  selection  problem  must  be  solved  by  every  animal  and  long-lived  robot  in  some  way 
or  another.  Further  we  are  only  considering  rational  agents.  Any  introductory  text  book 
on  decision  maring  (e.g.  [81])  defines  an  agent  to  be  rational  if  it  always  selects  that  action, 
i.e.  an  answer  to  Q1  and  Q2,  that  returns  the  highest  expected  utility.  Here  we  assume  that 
utility  is  proportional  to  the  reward  obtained  by  working,  which  is  discounted  over  time. 

After  considering  related  work,  we  analyze  the  problem  in  terms  of  a  simplified  abstract 
model  which,  when  parameterized  to  approximate  a  particular  robot  system,  predicts  the 
optimal  answers  to  Q1  and  Q2.  We  validate  the  model  by  comparing  its  predictions  with 
data  empirically  obtained  from  a  simulated  robot. 

To  the  best  of  our  knowledge,  this  is  the  first  proposed  solution  to  this  general  robotics 
problem. 

7.3  Related  Work 

The  literature  on  robotic  energy  management  has  many  aspects  ranging  from  docking  mech¬ 
anisms,  energy-efficient  path- planning,  to  fuel  types.  The  most  relevant  aspect  to  this  work 
is  on  act  ion- select  ion.  Perhaps  the  most  standard  and  simple  way  to  determine  when  it  is 
time  for  the  robot  to  recharge  is  to  set  a  fixed  threshold.  This  can  either  be  a  threshold 

1  Webster:  Czech,  from  robota:  compulsoiy  labour 
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directly  on  the  energy  supply  as  in  [75]  or  on  time  elapsed  since  last  charging  as  in  [4].  The 
latter  is  usually  easier  to  implement  but  less  accurate,  because  one  has  to  have  some  model 
of  the  energy  supply.  However,  [93]  showed  a  fixed  threshold  policy  can  be  improved  upon. 
While  it  is  true  that  maximising  the  energy  intake  rate  maximizes  potential  work  rate,  it 
does  not  optimise  with  respect  to  when  the  work  is  done.  It  also  assumes  that  recharging  is 
always  valuable.  This  is  often  true,  but  not  always,  as  we  show  below.  Also  notable  is  that 
all  of  the  above  papers  refuel  the  robot  to  maximum  capacity  at  each  opportunity,  and  do 
not  consider  that  this  may  not  be  the  best  policy. 

[57]  consider  the  problem  of  energy  efficient  rendezvous  as  an  action  selection  problem, 
and  so  investigate  the  where,  but  not  the  when  and  how  long,  to  refuel.  [12]  had  robots  living 
an  a  closed  ecosystem  learn  to  ‘survive’.  Here  robots  learned  to  choose  between  recharging 
and  fighting  off  competitors.  Birk’s  agents’  value  function  of  ‘survivability’  is  different  to 
that  considered  here.  The  rational  robot  and  its  owner  are  interested  in  gaining  maximum 
reward  by  working  at  the  robot’s  task,  and  are  indifferent  to  the  lifespan  of  the  robot.  This 
is  a  key  difference  between  the  purpose  of  robots  and  animals. 

Although  intended  as  a  wake-up  call  for  psychology  research,  Toda’s  Fungus  Eater 
though  experiment ([83])  has  been  influential  in  the  robotics  literature.  The  survival  quest 
of  a  mining  robot  on  a  distant  planet  contributed  significantly  to  ideas  of  embodiment  and 
whole  agents  [65],  but  the  action  selection  problem  presented  has  yet  to  be  solved  in  more 
then  the  trivial  way  of  a  fixed  threshold  policy. 

[78]  and  [59]  investigate  work  -  refuel  cycles,  or  as  they  call  it  ’basic  cycles’,  and  show  a 
simple  rule,  based  on  cue  and  deficit,  can  solve  a  two  resource  problem.  The  cue-deficit  policy 
is  inherently  reactive  and  thus  fails  to  cope  with  the  cost  of  switching  between  behaviours, 
it  also  lacks  any  form  of  lookahead  or  planning,  so  it  is  difficult  to  see  how  it  would  handle 
discounted  labour  situations. 

From  McFarland’s  work,  it  is  a  small  step  into  the  vast  literature  on  behavioural  ecol¬ 
ogy,  from  which  [42] ,  [80]  and  [79]  are  strongly  recommended  starting  points.  Due  to  the 
biological  background  of  these  publications,  the  analogy  to  labour  and  reward  in  a  robotic 
case  is  not  obvious.  The  majority  of  this  work  uses  dynamic  programming  (DP)  as  a  means 
of  evaluating  models  of  animal  behaviour.  An  exception  is  that  does  not  rely  on  DP  is  [40] , 
who  investigates  the  biomechanics  of  land  based  animals  to  derive  models  of  optimal  fuel 
load  during  migration.  The  model  optimises  for  migration  time,  does  not  map  directly  into 
discounted  labour  or  the  cyclic  work-charge  lifestyle  of  the  long-lived  robot. 

It  is  known  that  animals  prefer  a  small,  immediate  reward  to  a  large  delayed  reward. 
So  animals  seem  to  do  some  form  of  time  discounting.  According  to  [48]  the  reason  seems 
to  be  that  animals  in  general  are  risk  averse.  From  a  robotics  point  of  view,  one  major 
issue  with  the  descriptive  models  of  behavioural  ecology  is,  they  lack  the  ‘how  does  the 
animal  actually  do  it’  prescriptive  description  and  hence  do  not  translate  readily  into  robot 
controllers.  Work  that  tries  to  bridge  the  gap  between  ecology  and  robotics  is  [72].  Here 
Seth  uses  ALife  methods  to  evolve  controllers  that  obey  Herrnstein’s  matching  law  (roughly: 
relative  rate  of  response  matches  relative  rate  of  reward),  which  again  is  in  the  domain  of 
rate  maximization. 


7.4  The  Model 

In  this  section  we  describe  the  behavioural  model  used  in  this  work.  In  order  to  keep  the 
analysis  tractable  we  choose  an  abstract,  slightly  simplified  model.  The  world  is  modeled 
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Figure  7.1:  Refuel  and  time  discounted  labour  model,  see  text  for  details 


as  two  distinct  spatially  separated  sites:  a  work  site  and  a  refueling  site.  Moving  between 
sites  has  a  non-zero  cost  (Figure  7.1).  The  robot  has  a  energy  storage  of  Eft ),  where 
0  <  E(t)  <  Eraax>  If  the  energy  supply  drops  to  zero  anywhere  but  at  the  refueling  site,  the 
robot  looses  is  ability  to  move  or  work  and  can  gain  no  more  reward.  The  robot  can  be  in 
one  of  four  states: 

•  refuelling  with  a  refuelling  rate  of  eV,  to  do  so  the  robot  has  to  be  at  the  refuelling 
site. 

•  transitioning  from  the  refuelling  site  to  the  work  site,  the  duration  of  this  transition 
is  rd  and  the  robot  has  to  spend  energy  at  a  rate  of  e^,  so  the  transitions  cost  in  term 
of  energy  is  rded 

•  working,  which  gets  the  robot  a  reward  of  R  =  (E&i,  where  is  it  time  when 

the  robot  starts  to  work  and  rw  is  the  duration  the  robot  works  for.  Therefore  the 
reward  the  robot  earns  by  working  is  discounted  with  a  discount  factor  0  <  (3  <  1. 
While  working  the  robot  spends  energy  with  a  rate  of  ew.  In  other  words,  the  robot 
turns  energy  into  work  and  therefore  reward.  In  case  where  the  robot  performs  several 
work  sessions,  the  reward  is  accumulated  and  only  the  overall  reward  is  at  interest  to 
the  owner  of  the  robot.  As  with  refuelling,  work  can  only  be  performed  at  the  work 
site. 

•  transitioning  from  the  work  site  to  the  refuelling  site,  the  duration  of  this  transition 
is  rd  and  the  robot  has  to  spend  energy  at  a  rate  of  ed 

The  robot’s  goal  is  to  achieve  as  much  reward  as  possible.  To  do  so,  it  has  to  make  two 
decisions,  (1)  when  to  stop  refuelling  and  resume  work  and  (2)  when  to  stop  working  and 
refuel.  We  mostly  refer  to  the  action  of  accumulating  energy  as  refuelling  and  not  at 
recharging  because  we  want  to  emphasis  the  general  nature  of  our  model. 

It  is  worth  pointing  out  that  in  a  real  world  scenario  all  important  variables,  namely 
the  energy  rates,  could  be  known  in  advance  or  are  easily  measured  by  the  robot.  Here 
we  assume  these  variables  to  be  constant,  though  in  an  actual  implementation  we  would 
use  averages  as  approximations.  It  would  also  be  feasible  to  do  some  form  of  piece  wise 
linear  approximation  of  the  energy  rates.  The  discount  factor  can  also  be  assumed  to  be 
known,  since  this  factor  is  task  dependent  and,  hence,  is  set  by  the  owner  or  designer  of 


42 


the  robot  or  by  some  external  market.  As  we  show  below,  even  if  all  else  is  bxed,  the  robot 
owner  can  use  the  discount  factor  as  a  control  variable  that  can  be  tweaked  to  bne  tune  the 
robots  behaviour.  Everything  else  is  predebned  by  the  tasks,  the  robot’s  construction  or  the 
environment. 

In  order  to  improve  readability,  we  need  to  introduce  some  additional  notation.  k\  =  A1- 
is  the  ratio  of  the  energy  rate  while  refuelling  to  the  rate  while  working.  Similarly,  ^2  — 
is  the  energy  rate  while  transitioning  to  the  energy  rate  while  working.  k%  =  ^  =  tt  is  the 
ratio  of  the  energy  rate  while  in  transition  and  the  energy  rate  refuelling.  rT  is  the  time  spent 
refuelling  during  one  refuel- work  cycle.  The  amount  of  work  the  robot  can  perform  is  limited 
by  the  energy  supply  the  robot  has,  so  we  express  the  potential  work  duration  as  a  function 
of  refuelling  and  transitioning  time  where  rw  =  rTk\  —  which  is  basically  the  amount 

of  time  the  robot  can  work  for,  given  the  amount  of  energy  the  robot  got  from  refuelling 
minus  the  energy  the  robot  has  to  spend  to  travel  to  the  work  site  and  back  to  the  charging 
station.  We  also  introduce  the  period  of  time  T  =  rT  +  2 +  rw  =  7>(1  +  ki)  +  2r^(l  —  k 2) 
as  the  length  of  one  refuel- work  cycle. 


7.5  When  to  stop  working 

Let  e(^)  be  the  energy  in  the  robot’s  storage  at  time  t.  At  what  energy  level  e(^)  <  ew^r 
should  the  robot  stop  working  and  transition  to  the  refuelling  site?  Since  the  value  of  work 
is  time  discounted,  work  that  the  robot  performs  now  is  always  more  valuable  then  the  same 
amount  of  work  performed  later.  This  creates  an  inherent  opportunity  cost  in  transitioning 
from  the  work  site  to  the  refuelling  site  because  it  takes  time  and  costs  energy  Tdew  that 
cannot  be  spent  working.  This  implies  that  the  robot  needs  to  work  as  long  as  possible  now 
and  not  later.  Hence  the  only  two  economically  rational  transitioning  thresholds  are: 

*  C o—yr  7~dPd 

The  robot  stops  working  when  it  has  just  enough  energy  left  to  make  it  to  the  refuelling 
station.  The  robot  will  spend  the  maximum  amount  of  energy,  and  therefore  time, 
working,  ensuring  the  highest  reward  before  refuelling.  Comparatively,  should  a  higher 
transitional  threshold  be  used,  the  robot  would  stop  working  earlier  and  refuel  earlier, 
but  discounting  results  in  a  smaller  reward.  Should  the  transitioning  threshold  be 
smaller,  the  robot  would  have  insufficient  energy  to  reach  the  refuelling  station.  In 
this  case,  the  robot  cannot  gain  any  further  reward  because  it  runs  out  of  fuel  between 
the  work  and  refuel  sites. 

*  dw — yr  =  0 

The  robot  spends  all  of  its  energy  working  and  terminates  its  functionality  while 
doing  so.  At  first  glance  this  option  seems  counter  intuitive,  but  one  can  imagine 
highly  discounted  labour  situations,  such  as  rescue  missions,  where  the  energy  that 
would  otherwise  be  spent  on  approaching  a  refuelling  site  is  better  spent  on  the  task 
at  hand.  This  might  also  be  a  rational  option  if  the  transition  cost  is  very  high,  e.g. 
NASA’s  Viking  Mars  lander  took  a  lot  of  energy  to  Mars  in  the  form  of  a  small  nuclear 
reactor,  rather  than  returning  to  Earth  periodically  for  fresh  batteries  (the  recent  Mars 
rovers  employ  solar  cells  to  recharge  their  batteries  originally  charged  on  Earth). 
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Figure  72:  General  discounting  m  refuel  -  work  cycles.  The  shaded  areas  are  periods  m 
which  the  robot  works  and  thus  earns  a  reward.  The  white  areas  correspond  with  time  m 
which  the  robot  does  not  obtain  any  rewards  because  it  either  travels  or  refuels 


7.5.1  Suicide  or  live  forever? 

Using  our  simple  mo  del,  we  can  determine  whether  a  robot  m  a  given  scenario  should 
terminate  while  working  or  continue  indefinitely  with  the  work- re  fuel  cycle.  Let 


/ 

T: 


T,-  +T,.[+T,,. 


fS'dt  =  1tfTV+r"! 


TV-pr*£ 


bT™  -  1 

ln(y?) 


(y.i) 


be  the  reward  obtained  from  spending  energy  or  time  during  the  first 

working  period  (see  figure  V. 2).  In  this  figure  the  shaded  areas  correspond  to  time  m  which 
the  robot  performs  work  and  thus  obtains  a  reward  proportional  to  the  size  of  the  shaded 
area.  Later  work  periods  are  discounted  more  strongly  and  hence  provide  a  smaller  reward. 
White  areas  correspond  to  times  m  which  no  reward  is  earned  because  the  robot  either  travels 
between  the  work  and  refuelling  site  or  it  refuels.  The  size  of  this  area  is  proportional  to 
the  opportunity  cost?  that  isv  reward  that?  m  principal^  could  have  been  obtained  if  the  time 
had  been  spent  working. 

Let  T  be  the  duration  of  one  full  work -refuel  eye \ev  that  is  working  -  transition  -  refuel 
-  transition t  or  T  =  +  r*  +  .  Therefore ,  the  reward  gamed  m  the  next  cycle  is 

the  initial  reward  discounted  by  T  and  becomes  j3TH q.  Subsequent  rewards  are  again 
discounted  by  T  and  so  the  reward  for  the  third  cycle  is  The  sum  of  all  rewards  if 

working  infinitely^  that  is  choosing  is 


CO  . 

1^=110^^  =Xo-—^ 
';=0  1  '■ 


In  practise  no  system  will  last  forever^  so  this  analysis  is  slightly  biased  towards  infinite  life 
histories. 
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If  the  robot  chooses  ew^r  =  0  it  gains  the  initial  reward  Ro  plus  a  one  time  bonus  of 
R+  =  /  /Pdt  =  ^+T-+T“  (7.3) 

Jrr+Ttl+Tw  lli\P) 

by  spending  the  energy  required  for  transitioning  on  working.  The  reward  gained  over  the 
live  time  of  the  robot  (which  is  fairly  short)  is  RriP  =  Ro  +  ii+. 

So  the  answer  to  Q2  is  that  the  rational  robot  selects  that  threshold  ew^r  that  achieves 
the  higher  overall  reward,  so  it  picks 
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(7.4) 


Since  the  discount  function  J  ft* dt  belongs  to  the  class  of  memory-less  functions,  we  only 
have  to  calculate  eq.  7.4  once,  in  other  words  if  it  is  the  best  option  to  refuel  after  the  first 
work  cycle  it  is  always  the  best  option  to  do  so  and  vice  versa. 


7.6  How  much  energy  to  store 

We  have  shown  how  to  determine  a  threshold  for  transitioning  from  work  to  refuelling.  In 
this  section  we  will  analyse  when  to  stop  refuelling  and  resume  work,  or  phrased  differently, 
how  much  energy  to  accumulate  before  starting  to  work.  Energy  and  time  are  interchange¬ 
able  elements,  provided  that  we  know  the  rate  at  which  energy  is  spent  and  gained.  Since 
discounting  is  done  in  the  time  domain,  our  analysis  equates  energy  with  time  for  simplicity. 
Based  on  this,  we  can  ask  the  time  equivalent  of  Ql:  Tow  long  should  the  robot  refuel  for?7 
We  call  this  refuelling  duration  7>.  To  be  rational,  the  robot  must  refuel  long  enough  to 
gain  enough  energy  to  make  the  trip  to  the  work  site  and  back,  that  is  2e^r^,  otherwise  it 
would  have  to  turn  around  before  reaching  the  work  site  and  thus  will  not  gain  any  reward. 
Refuelling  after  the  storage  tank  is  full  is  time  wasted  that  would  better  be  spent  obtaining  a 
reward.  Therefore  the  refuelling  time  is  limited  to  2rdks  <  rT  <  Emaa;eV_1.  In  the  following 
we  assume,  without  loss  of  generality,  the  robot  starts  at  the  refuelling  site  with  an  empty 
fuel  tank.  Assuming  differently  will  just  result  in  shift  of  the  analysis  by  a  constant  factor, 
but  will  not  change  the  overall  conclusions. 


7.6.1  Acyclic  tasks 

First  we  examine  situations  in  which  the  robot  has  to  refuel  for  a  task  that  has  to  be  done 
only  once,  that  is  the  robot  refuels,  performs  the  task,  and  returns  to  the  refuelling  site. 
Depending  on  the  time  spent  refuelling  the  robot  obtains  the  following  reward  during  the 
upcoming  work  period. 


rrrki -2rdk2 

R(rr)  =  /  fidt  =  (3Tr+Td  /  fidt 

J Tr+Td  J  0 

Next  we  need  to  find  the  fr  that  maximises  R(rr ),  which  is 

l°g£  (kT+ 1)  +  2rdfc2 


rr  =  argmax(R(Tr))  = 


ki 


(7.5) 


(7.6) 
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Figure  V.-l-:  Reward  depending  on  refuelling  time  with  in  example  configuration  = 
0  A  Pt  =0.ri,ri  =  0.07,  r,.  =  8Ss 


Figure  7.5  shows  -an example  reward function  (eq.  7.0)  depending™  the  refuelling  duration 
■tv  .  Using  eq.  7.6  we  calculate  that  for  this  particular  configuration  the  reward  is  niaximdsed 
when  the  robot  refuels  for  =  36.6254....  If  the  fuel  tank  is  filled  before  that  time,  the  best 
the  robot  can  do  is  return  to  work.  This  will  give  it  the  highest  reward  achievable,  but  the 
designer  should  keep  in  mind  that  there  might  exist  a  class  of  robots  with  a  larger  fuel  tank 
that  will  achieve  a  higher  reward.  Note  that  if  the  robot  stops  refueling  at  even  if  its 
energy  store  is  not  full  to  capacity,  and  transitions  to  working,  it  earns  the  highest  reward 
possible.  'ID  our  knowledge  this  has  not  been  stated  expldtly  in  the  robotics  literature 
before.  It  is  generally  assumed  that  robots  should  completely  recharge  at  each  opportunity, 
but  this  is  not  always  the  optimal  strategy. 


V.6.2  Cyclic  tasks 

In  cyclic  tasks  a  robot  is  required  to  always  return  to  work  after  resupplying  with  energy. 
Here  the  analysis  is  slightly  different  then  in  the  acyclic  case  because  the  refuelling  time  of 
the  current  cycle  not  only  influences  the  duration  and  length  of  the  work  period  of  this  cycle 
but  of  all  cycles  to  come.  Hence,  we  should  select  a  refuelling  threshold  that  maximises  the 
overall  reward  The  overall  reward  is  calculated  by  (see  fig.  7.2) 


p^(Tr)=p^;ry-r= 

;=a 


i-TV  +T.J 


(i 


(V.r) 


Unfortunately,  it  seems  impossible  to  find  a  closed  form  solution  to  ^_  =  c^^jrawc  (JiV...  f'O)  ■ 
However,  eq.  7.7  can  easily  be  evaluated  for  a  given  tv  and  so  calculating  the  reward  for 
each  of  a  finite  set  of  values  for  rT  and  selecting  the  one  that  maximises  the  reward  is  quite 
practical.  In  any  real  application  tit:  number  of -tv  to  be  tested  is  limited  and  possibly  rather 
small,  in  the  order  of  a  few  thousand.  This  is  because  any  real  robot  will  have  a  finite  energy 
storage  and  ary  practical  scenario  will  require  oily  limited  sampling  due  to  the  resolution 
of  the  fuel  gauge.,  the  uncertainty  in  the  environment,  etc-  In  the  case  of  our  Chatterbox 
Robot  (see  below),  the  battery  capacity  is  2.3  Ah  and  the  fuel  gauge  has  a  resolution  of 
imA,  resulting  in  less  then  5000  calculations  for  an  exhaustive  search. 
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Figure  7.4:  Office  10k  environment  with  i  charging  station^  aid  i  work  site  "W*.  The  red 
line  is  the  stylised  path  the  robot  travels  on. 

7.7  Experiments 

In  this  section  we  present  er^eriments  to  validate  the  theoretical  results  described  in  detail 
above.  All  experiments  were  performed  using  the  robot  simulator  Stage2.  The  simulated 
robot  uses  simulated  electrical  energy,  where  we  assume  charging  and  discharging  to  be 
linear,  with  constant  current  for  charging  working  /,,,  and  driving  .  We  further  ig¬ 
nore  any  effects  caused  by  the  docking  mechanism.  change  in  battery  chemistry  or  ambient 
temperature. 

In  all  experiments  we  rouglnly  model  a  Chatterbox  robot,  a  robot  designed  and  build  at 
SF'TJ  based  on  an  IRobot  I  Create  platform.  This  robot  has  a  battery  capacity  of  approxi¬ 
mately  2.8Ah  and  draws  about  2A  while  driving.  Vte  defined  an  abstract  work  task  which 
consumes  4A  of  current.  Once  at  a  charging  station,  the  robot  docks  reliably  and  recharges 
with  2A.  The  world  the  robot  operates  in  is  office-like  with  one  charging  station  and  one 
work  site  shown  in  fg.  7.4.  The  obstacle  avoidance  and  navigation  controller  drives  the 
robot  horn  the  charging  station  to  the  work  site  and  vice  versa  in  approximately  t,j  =  85s. 
Due  to  naturally  accruing  noise  in  the  experimental  setup  the  travel  time  may  vary  by  up 
to  6  seconds.  While  working,  the  robot  receives  one  unit  of  reward  per  second,  discounted 
by  ft-  Discounting  occurs  on  a  one  second  basis. 

V.V.i  Cyclic  Task 

The  goal  of  this  experiment  is  to  evaluate  how  closely  our  analysis  from  section  7.G.2  matches 
a  robot  in  a  simulated  environment.  In  this  experiment  the  robot's  task  is  to  recharge  for 
some  time  tv,  proceed  to  the  work  site,  work  until  the  batteiy  energy  drops  to 
return  +-=■  the  charging station,  and  repeat  the  process.  The  rewardfbrwork  is  discounted  by 
=  0.9997.  To  find  out  which  jv  maximises  the  reward  we  varied  the  threshold  for  leaving 
the  charging  station  in  each  try.  A  trial  lasted  for  50000  seconds  (rz.  is.. 3  tours) . 
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Figure  V.  5:  Comparing  analytical  and  simulation  results  lor  accumulated  reward  from  a 
cyclic  task  depending  on  refuelling  time  with  an  example  configuration  J ^  =  2.0V  = 

=  4.0fJS=  0.999^rd  faSS 

Figure  V.  5  compares  the  accumulated  reward  gamed  over  r*  from  the  simulation  and  from 
the  best  solution  obtained  from  the  model  by  iterating  over  eq.  V.V.  The  recharging  time 
that  maximises  the  reward  is  predicted  by  the  model  to  be  4  =  1219  and  in  the  simulation 
r*  =  11  VO.  The  difference  comes  from  the  variation  m  time?  and  therefore  energy,  the  robot 
requires  to  travel  between  the  charging  station  and  the  work  site.  Not  only  does  this  change 
the  starting  time  of  work  which  influences  the  rewar  d,  it  also  makes  it  necessary  to  give  the 
robot  a  small  amount  of  spare  energy  to  ensure  it  would  not  run  out  of  battery.  Tins,  m  turn? 
delays  charging  and  thereby  influences  the  reward  gamed.  However ,  the  empirical  results 
agree  qualitatively  with  the  values  predicted  by  the  models  and  the  optimal  recharging  time 
predicted  by  the  model  was  within  4%  of  that  observed  m  the  simulation. 

7.7.2  Acyclic  Task 

As  before^  we  perform  this  experiment  m  order  to  compare  the  theoretical  results  with  a 
simulation.  The  setup  is  the  same  as  m  the  cyclic  task  experiment  with  the  difference  that 
the  robot  only  has  to  perform  one  charge-work  cycle.  Figure  V.  6  compares  the  simulation 
results  to  the  analytical  results.  Where  the  general  shape  of  the  curve  is  similar  to  that  m 
the  cyclic  task?  it  is  worth  to  point  out  that  the  maximum  reward  is  gamed  with  a  larger 
charging  threshold.  This  is  intuitively  correct  as  the  robot  has  only  once  chance  to  obtain 
a  reward.  It  can  be  (depen dmg  on  the  discount  factor)  beneficial  to  begin  work  later  ?  but 
to  work  for  a  longer  period.  Fbr  our  configuration t  the  most  profitable  theoretical  charging 
time  is  =  2SV2.  V  and  the  best  simulation  results  were  obtained  with  r*  =  2SS0.  A  gam 
the  differenc e  between  the  theoretical  and  experimental  results^  barely  visible  m  the  plot? 
are  due  to  imprecision  m  the  robot  simulation. 
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Figure  V.  6:  Comparing  analytical  and  simulation  results  for  accumulated  reward  foam  an 
acyclic  task  depending  on  refuelling  time  with  an  example  configuration  J ^  =  2.0V  = 

=  4.0fJS=  0.999^rd  faSS 


7.7.3  Once  or  forever 

In  a  further  experiment  we  investigate  the  circumstances  under  which  it  is  more  profit able? 
and  hence  ration al?  for  the  robot  to  fully  deplete  its  energy  supply  while  working  and  when 
it  is  better  to  choose  a  perpetual  refuelling  policy  As  m  the  previous  experiments  we  use 
a  simulated  Chatterbox  robot  with  the  previously  described  parameters  m  the  offic e-like 
environment.  Fbr  this  scenario^  we  vary  the  discount  rate  between  0.9S50  and  0.9999  m 
0.0005  increments  and  run  two  sets  of  simulations.  In  the  firsts  the  robot  depletes  it?s 
energy  supply  while  working^  that  is t  we  choose  the  leave  work  threshold  =  0.  Fbr  the 

second  set?  we  choose  a  leave  work  threshold  that  causes  the  robot  to  keep 

performing  work- refuel  cycles  forever.  Since  we  change  the  discount  rate  we  h ave  to  adapt 
the  leave  refuelling  site  threshold  m  order  for  the  robot  to  earn  the  higjiest  possible  reward. 
Fbr  this  determine  the  optimal  threshold  m  the  same  way  as  for  the  previous  experiments. 
Figure  7  V  depicts  the  rewards  obtained  for  different  discount  factors  with  each  policy  As 
the  graph  further  shows t  for  higher  discounting  (smaller  discount  rate)?  it  is  beneficial  for 
the  robot  to  choose  a  one  time  work  policy  Conrersely^  for  smaller  discounting  (higjier 
it  pays  to  keep  working.  The  theoretical  discount  rate  for  switching  the  policy  from  one 
work  period  to  an  infinite  work  refuel  cycle  is  =  0.99V9?  which t  as  the  graph  shows t  closely 
resembles  the  experimental  result. 


7.8  Discussion  and  Conclusion 

We  outlined  a  theoretical  analysis  of  when  to  refuel  and  for  how  long  to  refuel  a  robot 
m  situations  where  the  reward  for  the  robot ?s  objective  is  discounted  over  time.  This 
discounting  isv  more  often  then  notv  ignored  m  robotics  literature^  although  it  is  at  the  very 
base  of  rational  behaviour  ([SI]).  We  took  theoretical  results  and  demonstrated  that  they 
apply  to  a  simulated  robot.  In  these  simulations  we  assumed  the  location  of  and  the  distance 
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Figure  V.  V:  Reward  obtained  for  different  discount  factors  with  two  leave  work  thresholds. 
Configuration  =  2.0t  =  2.0 =  4.0 ^  S5 


between  work  and  refuelling  station  to  be  known.  This  is  reasonable  m  the  state  of  the  art 
m  mapping  and  localization^  m  a  wide  range  of  scenarios.  We  further  assumed  the  average 
energy  spending  rates  to  be  constant  and  known,  something  achievable  m  most  cases.  One 
assumption  made  that  simplifies  a  real-world  robot  scenario  is  the  refuelling  rate.  Gasoline- 
powered  vehicles  which  refuel  from  a  standard  gas  station  have  a  constant  refuelling  rate?  or 
close  to  it.  However t  the  charging  rate  of  a  battery  may  depend  on  many  factors  including 
the  charging  method  used?  temper ature,  battery  chemistry^  and  the  current  capacity  of  the 
battery.  One  useful  extension  of  our  model  would  be  to  include  a  realistic  chemical  battery 
recharge  transfer  function. 

This  paper  has  presented  and  analyzed  a  core  action  selection  problem  for  autonomous 
agents  such  as  animals  and  mobile  robots:  how  much  to  fuel  before  working,  and  when  to 
abandon  working  and  return  to  fueling^  such  that  the  value  of  discounted  work  is  maximized. 
A  simple  model  readily  provides  answers  to  these  questions  and  closely  predicts  the  observed 
behaviour  of  a  robot  simulation.  Whille  the  model  is  simple^  it  is  very  general^  and  these 
results  suggest  that  it  could  be  of  practical  as  well  as  theoretical  interest.  We  propose  it  as 
a  baseline  to  build  upon. 
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Chapter  8 

Adaptive  multi-robot  bucket 
brigade  foraging 


Authors:  Adam  Lein  and  Richard  Vaughan 

8.1  Abstract 

Bucket  brigade  foraging  improves  upon  homogeneous  foraging  by  reducing  spatial  inter¬ 
ference  between  robots,  which  occurs  when  robots  are  forced  to  work  in  the  same  space. 
Robots  must  spend  time  avoiding  one  another  instead  of  carrying  out  useful  work.  Bucket 
brigade  foraging  algorithms  restrict  the  motion  of  each  robot  to  at  most  some  fixed  distance 
from  its  starting  location.  We  reproduce  the  performance  of  known  bucket  brigade  foragers, 
and  then  present  a  new  controller  in  which  robots  adapt  the  size  of  their  foraging  area  in 
response  to  interference  with  other  robots,  improving  overall  performance.  This  approach 
also  ha s  the  potential  to  cope  with  nonuniform  resource  distributions. 

8.2  Introduction 

The  foraging  problem  is  a  task  in  which  one  or  more  agents  must  find  and  collect  target 
objects  and  deliver  them  to  a  “home  area”.  The  simplest  adaptation  of  this  problem  to 
multi- robot  systems  is  for  many  robots  to  independently  solve  the  foraging  task,  a  solution 
known  as  homogeneous  foraging . 

[74]  explore  foraging  strategies  in  large-scale  multi- robot  systems.  The  key  variable  in 
their  experiments  is  spatial  interference:  destructive  interactions  between  robots  forced  to 
perform  work  in  the  same  space.  Multi- robot  systems  are  advantageous  only  when  what 
might  be  called  the  marginal  performance  -  the  benefit  to  performance  of  adding  a  single 
robot  to  the  system  -  is  positive.  However,  as  those  authors  demonstrated,  there  comes  a 
time  when  this  is  no  longer  so,  when  the  loss  of  performance  due  to  interference  between 
robots  outweighs  the  gain  in  work  done  by  having  more  workers. 

[35]  and  [63]  describe  bucket  brigading ,  in  which  each  robot  is  restricted  to  a  finite 
search  area,  and  instead  rely  on  their  coworkers  to  both  deliver  pucks  into  their  search 
area,  and  remove  pucks  out  of  it.  By  restricting  each  robot  to  a  finite  area  whose  size  is 
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determined  a  priori,  interference  is  ameliorated.  [74]  empirically  investigate  the  performance 
of  large  groups  of  robots  as  a  function  of  varying  search  area  sizes;  homogeneous  foraging 
corresponds  to  search  areas  of  infinite  radius.  [63]  describe  the  expected  performance  of 
multi-robot,  space-constrained  systems  as  a  curve  to  which  the  R  =  40m  curve  in  Figure  8.2 
roughly  corresponds;  the  curve  has  a  local  maximum,  after  which  the  marginal  performance 
is  negative. 

In  this  research,  we  investigate  an  approach  to  bucket  brigading  that  does  away  with  a 
priori  search  radii  by  allowing  robots  to  adapt  their  search  radii  in  response  to  interference 
with  other  robots,  improving  overall  perfomance. 

8.3  Related  work 

Foraging  is  a  common  analogy  for  a  wide  variety  of  robot  tasks,  including  exploration  and 
mapping,  search,  and  actual  objects  retrieval. 

[93]  applied  the  rate- maximizing  foraging  model  to  a  single  robot  performing  the  task 
of  foraging  over  a  long  period  of  time.  The  robot  had  a  finite  energy  supply,  and  was 
required  to  travel  to  a  charging  station  to  recharge  its  batteries.  While  recharging,  and  while 
traveling  between  the  work  site  and  the  charging  site,  the  robot  is  not  doing  work.  The 
authors  presented  a  scalable,  online,  heuristic  algorithm  for  the  robot  to  recharge  efficiently, 
maximizing  the  proportion  of  its  time  it  spends  working. 

In  [103],  building  on  [16],  the  problem  of  spatial  interference  in  multi-robot  systems  was 
addressed  through  the  use  of  aggressive  display  behaviors.  Several  robots  were  required  to 
perform  a  transportation  task  (akin  to  our  foraging  task)  in  shared  space.  Robots  selected 
an  “aggression  level”  based  on  the  amount  of  work  they  had  invested  up  to  that  point.  The 
discrepancy  in  aggression  levels  between  interfering  robots  was  used  to  break  the  symmetry 
that  would  otherwise  have  lead  to  deadlock.  The  authors  showed  their  approach  to  be 
effective,  both  in  simulation  and  in  a  real-world  implementation. 

[53]  formally  modeled  the  effect  of  interference  on  the  performance  of  a  swarm  of  foraging 
robots.  Their  model  formulated  as  a  system  of  coupled  hr st- order  nonlinear  differential 
equations.  They  found  that  group  performance  grows  sublinearly  with  group  size,  so  that 
individual  performance  actually  decreases  with  increasing  group  size.  Simulations  verified 
the  predictions  of  their  model. 

[69]  performed  experiments  in  which  real  robots  perform  a  foraging  task  using  a  variety 
of  simple  communication  methods.  Robots  communicated  by  hashing  a  light  bulb  under 
various  circumstances.  The  authors  showed  that  communication  can  reduce  the  variance 
in  the  robots7  performance.  In  contrast,  this  research  does  not  use  explicit  communication 
between  robots;  communication  is  implicit,  however,  in  that  robots  must  alter  their  behavior 
in  the  short  term  in  response  to  the  presence  of  other  robots  (collision  avoidance),  and  in 
the  long  term  by  adapting  a  parameter  of  their  behavior  (discussed  later). 

8.4  Simulation 

In  order  to  reproduce  the  results  of  [74],  and  compare  them  to  the  performance  of  the 
adaptive  bucket  brigaders,  we  developed  a  simulator  similar  to  that  in  [74].  A  description 
follows: 

Robots  are  located  on  a  64  meter-square  plane  scattered  with  pucks,  in  the  northeast 
corner  of  which  plane  is  a  3  m  quarter  circle  “home  area77 .  Robots  are  equipped  with  grippers 
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which  they  can  use  to  retrieve  these  pucks,  and  when  a  puck  is  dropped  off  in  the  home 
area,  it  is  said  to  have  been  foraged. 

Robots  can  move  forward  at  a  rate  of  0.1  m/sec,  and  can  turn  to  either  side  at  a  rate 
up  to  once  every  five  seconds.  It  takes  four  seconds  to  retrieve  a  puck,  but  a  carried  puck 
can  be  dropped  instantaneously. 

Robots  can  sense  walls  and  other  robots  within  0.6  m  of  their  centers  without  error 
(compared  with  0.5  m  in  [74]),  via  the  use  of  12  radially  oriented  sensors.  They  cannot 
sense  pucks  unless  the  puck  in  question  is  located  directly  within  the  grip  of  their  grippers, 
and  this  sensing  is  also  without  error.  Robots  can  determine  the  direction  towards  the  home 
zone  at  any  time;  [74]  explain  this  as  the  use  of  a  “four- bit  compass” .  Robots  use  the  data 
from  their  proximity  sensors  to  avoid  walls  and  other  robots. 

[74]  adds  a  variety  of  noise  to  each  parameter  in  their  simulation.  For  simplicity,  we  have 
dispensed  with  this  noise,  except  where  explicitly  mentioned. 

8.4.1  Parametrized  bucket  brigading 

The  purpose  of  the  overall  robot  system  is  to  retrieve  pucks  and  deliver  them  into  the  home 
area  -  to  forage  the  pucks.  In  the  bucket  brigade  approach  to  this  problem,  individual 
robots  do  not  attempt  to  carry  a  puck  all  the  way  to  the  home  zone  themselves,  but  rather 
merely  to  shift  the  distribution  of  pucks  towards  the  home  area. 

Each  robot  will  attempt  to  stay  within  a  zone,  the  size  and  location  of  which  it  keeps 
track.  The  robot  can  determine  how  far  it  is  from  the  center  of  this  zone,  and  can  tell  the 
direction  towards  the  center  of  the  zone.  However,  this  memory  is  noisy;  the  zone  within 
which  any  robot  stays  drifts  on  a  random  walk  at  a  rate  of  0.01  m/sec. 

The  robot  searches  via  a  naive  algorithm  within  its  zone.  If  it  ever  leaves  the  zone,  it 
will  drop  of  any  puck  it  may  be  carrying,  return  towards  its  zone,  and  continue  searching. 
If  it  ever  discovers  a  puck,  it  will  retrieve  it  and  head  towards  the  home  are.  The  effect  is 
that  a  “brigade”  of  similarly- behaving  robots  will  slowly  transport  pucks  from  one  robot 
(zone)  to  the  next,  gradually  bringing  the  puck  closer  to  the  home  area.  Of  course,  ultimate 
delivery  of  the  puck  requires  a  connected  sequence  of  overlapping  zones  ending  in  the  home 
area,  but  this  may  be  achieved  over  time  (even  if  never  simultaneously)  due  to  the  drift  of 
the  zones. 

Shell  and  Mataric’s  robots  all  share  the  same  ranges.  In  the  following  sections,  we  will 
explore  other  approaches  to  assigning  ranges  to  robots.  In  any  given  experiment,  every 
robot  uses  the  same  approach  to  range  selection. 

8.4.2  Radial  parametrization 

Given  that  the  emergent  behavior  of  the  robots  is  to  shift  the  distribution  of  the  pucks 
towards  the  home  area,  the  first  natural  modification  to  the  controller  would  be  to  allow 
the  range  parameter  of  robots  to  vary  with  the  distance  to  the  home  area.  Effectively, 
we  are  discarding  the  assumption  that  the  distribution  of  pucks  is  uniform,  but  supposing 
that  they  may  be  more  densely  distributed  near  the  home  area,  and  therefore  a  different 
range  parameter  would  be  optimal.  To  demonstrate  this,  we  simulated  robots  whose  range 
parameter  varied  linearly  with  distance  from  the  home  area. 

Let  us  consider  the  following  idealized,  one- dimensional  situation:  robots  and  pucks  sit 
uniformly  distributed  on  a  line  of  length  L,  with  the  home  zone  at  one  end.  Every  robot 
collects  a  puck  at  the  same  time,  drives  a  fixed  distance  D  towards  the  “home”  end,  deposits 
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Figure  8.1:  Typical  density  distribution  of  pucks  after  (a)  40  minutes  and  (b)  80  minutes.  One 
square  is  2  m  X  2  m;  darker  squares  indicate  a  higher  concentration  of  pucks.  The  home  area  is  in 
the  northeast  corner  of  the  world.  In  this  simulation,  pucks  were  not  added  to  the  world  after  the 
start  of  the  experiment. 


its  puck,  and  then  returns  to  its  starting  location,  at  which  point  the  process  repeats.  Let 
pn(r),  where  r  is  the  distance  from  a  given  point  to  the  home  end,  be  the  density  of  pucks 
at  that  point  after  n  of  these  cycles  has  taken  place.  po(0  is  some  constant  (the  initial  puck 
density).  Then 


/  (1  -  c)pn(r)  if  r  >  L-R, 

Pn+i  v  )  -  |  (1  _  c)pn(r)  +  cpn  (z^d)  Otherwise.  h 

In  our  simulation,  an  analogous  process  occurs  in  two  dimensions.  Puck  densities  at  two 
time-steps  are  displayed  in  Figure  8.1  during  a  typical  run  of  an  experiment  in  which  pucks 
are  not  added  to  the  system  as  fast  as  they  are  removed  from  the  system  by  foraging  (pucks 
in  the  other  experiments  in  this  work  are  added  just  as  fast  as  they  are  foraged,  to  maintain 
constant  average  puck  density). 

In  all  experiments,  pucks  are  initially  distributed  at  random.  However,  it  can  clearly 
be  seen  that  as  soon  as  the  robots  interact  with  the  pucks,  the  distribution  becomes  less 
random,  biased  toward  the  home  area  —  the  system  has  a  form  of  entropy  that  decreases 
as  a  result  of  the  work  of  the  robots.  Since  the  optimal  search  radius  for  a  robot  depends 
on  the  density  of  pucks,  once  the  density  of  pucks  changes,  the  robot’s  original  choice  for 
search  radius  may  no  longer  be  optimal.  Ideally,  robots  would  know  and  select  the  optimal 
choice  for  search  radius  on  an  ongoing  basis,  but  in  these  experiments,  robots  are  not  given 
enough  information  to  achieve  this  ideal.  In  the  next  section,  we  propose  a  method  for 
approximating  this  ideal. 

8.4.3  Adaptive  range  selection 

The  aforementioned  ideal  puts  an  extra  burden  on  the  robot  —  it  must  be  constantly  aware 
of  the  distance  between  the  center  of  its  zone  and  the  center  of  the  home  area.  Alternatively, 
it  must  be  able  to  measure  the  local  puck  density.  In  place  of  these  assumptions  we  may 
also  allow  robots  to  adaptively  select  their  range  parameter  using  purely  local  information. 
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Fixed  range,  density=0.781 


Figure  8.2:  Performance  with  fixed  and  adaptive  search  radii  at  0.781  pucks /m2.  R  is  the  search 
radius  of  each  robot  in  the  trial.  The  curves  labeled  dR+  show  the  results  for  adaptive  range 
selection. 


In  adaptive  range  selection ,  a  robot  will  continuously  increase  its  range  parameter  at  rate 
except  while  it  is  avoiding  a  collision  with  another  robot,  to  which  situation  the  robot 
will  react  by  shrinking  its  zone  at  rate  dR~ ,  thus  making  it  less  likely  to  interfere  with  other 
robots  in  the  future. 

Results  for  adaptive  range  selection  are  included  in  Figure  8.2  and  8.3. 

8,4,4  Experimental  design 

Initially  we  followed  [74]  in  experimental  design.  The  following  parameters  were  varied: 
puck  density  (0.781/m2  and  3.125/m2);,  search  area  radius  (5,  10,  20,  30,  40,  or  50  m),  and 
number  of  robots  (20,  40,  60,  80,  ...,  500).  The  task  was  simulated  for  each  combination  of 
parameters  for  2000  simulated  seconds,  and  the  number  of  pucks  foraged  after  that  time  was 
recorded.  Twenty  such  trials  were  run,  each  with  a  different  initial  distribution  of  pucks; 
to  control  for  robot  position,  robots  were  initially  placed  on  a  square  lattice.  The  reported 
results,  in  Figures  8.2  and  8.3,  are  the  averages  of  those  twenty  trials.  Error  bars  indicate 
the  standard  deviations  of  the  twenty-trial  experiments. 

Next,  we  tested  our  adaptive  range  selection  controller  using  the  same  experimental 
setup.  For  these  experiments,  search  radii  were  allowed  to  increase  by  dR+  =  0.1  m/s  and 
decrease  by  dR~  =0.05  m/s,  biasing  the  robots  towards  the  limit  of  homogeneous  foraging 
in  the  absence  of  significant  interference.  Each  robot  began  with  a  small  range  of  5  m. 
For  each  parameter  set,  twenty  trials  were  run,  and  mean  performances  were  plotted  with 
standard  deviations  shown. 
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Fixed  range,  density=3.125 


number  of  robots 

Figure  8.3:  Performance  with  fixed  and  adaptive  search  radii  at  3.125  pucks/m2.  R  is  the  search 
radius  of  each  robot  in  the  trial.  The  curves  labeled  dR+  show  the  results  for  adaptive  range 
selection. 


8.5  Results 

Our  first  step  was  to  reproduce  the  results  in  [74].  Visual  analysis  of  the  data  in  Figures  8.2 
and  8.3  indicates  that  this  was  accomplished,  in  that  increasing  the  radius  of  robots7  search 
areas  in  the  fixed- radius  regime  led  to  an  increase  in  the  marginal  benefit  of  adding  robots 
(i.e.,  to  the  slope  of  the  curves  in  those  graphs),  but  only  up  to  a  point:  eventually,  adding 
more  robots  decreases  the  performance  of  the  system  as  overcoming  interference  begins  to 
dominate  the  robots7  behavior.  These  critical  points  are  clearly  visible  as  the  significant 
local  maxima  in  the  R  =  30  m,  R  =  40  m,  and  R  =  50  m  curves. 

There  is  a  noteworthy  distinction  in  that  robots  in  our  experiments  only  foraged  approx¬ 
imately  half  the  pucks  that  those  of  [74]  did;  this  indicates  more  a  quantitative  difference  in 
the  efficacy  of  the  robots7  controller  programs  than  a  qualitative  failure  of  our  simulations  to 
agree  at  the  heart  of  the  matter:  that  interference  affects  a  growing  population  later  when 
the  individuals7  foraging  spaces  are  larger,  which  is  indicated  by  the  relative  shapes  of  the 
curves. 

Our  adaptive  range  selection  algorithm  performed  at  least  as  well  as  the  fixed  ranged 
controllers  in  simulation,  and  scaled  better.  Fixed  range  algorithms  suffered  from  one  of 
two  problems:  robots  with  small  search  areas  did  not  gather  many  pucks,  and  robots  with 
large  search  areas  interfered  too  much  and  the  critical  point  at  which  the  marginal  benefit 
of  increasing  the  number  of  robots  was  reached  when  the  number  of  robots  was  still  small. 
While  the  robots  using  adaptive  range  selection  did  not  gather  as  many  pucks  when  the 
number  of  robots  was  small  as  did  robots  with  large,  fixed  search  radii,  increasing  the 
number  of  robots  always  increased  the  performance  of  the  group. 

Also  noteworthy  is  that  the  adaptive  controllers  performed  more  consistently,  as  indi- 
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cated  by  tighter  error  bars  on  those  curves  than  on  the  fixed- radius  performance  curves. 
Since  adaptation  to  interference  is  a  form  of  implicit  communication,  this  is  in  agreement 
with  the  findings  of  [69]. 

Adaptive  range  selection  was  sensitive  to  variations  in  dR+  and  dR~ .  If  dR+  was  too 
small,  adaptive  selection  underperformed  the  fixed  range  foragers.  Figures  8.2  shows  results 
when  dR+  =  0.01,  a  fifth  of  dR~ .  In  that  case,  the  adaptive  controller  performs  no  better 
than  the  worst-performing  fixed-range  controller  we  tested.  This  is  not  altogether  surprising, 
since  the  search  radius  in  the  worst  fixed- range  controller  and  the  initial  search  radius  in 
the  dR+  =0.01  m  adaptive  controller  were  both  5  m. 

Figure  8.3  shows  qualitatively  similar  improvements;  however,  in  this  scenario  (where 
puck  density  is  3.125  pucks/ square  meter,  the  improvement  is  only  slight,  even  for  large 
group  sizes). 

8.6  Future  work 

In  this  paper,  we  have  explored  only  some  of  the  ways  —  and  naive  ones  at  that  —  of 
improving  on  the  bucket  brigading  algorithm.  Future  work  may  explore  generalization  of 
the  problem  to  discard  hidden  assumptions,  or,  increasing  the  adaptability  of  the  system. 
For  instance,  a  more  plausible  biological  analog  might  be  ants  foraging  for  food.  In  such  a 
scenario,  the  ants  would  not  initially  be  uniformly  distributed  through  the  held,  nor  would 
the  food. 

The  performance  of  the  adaptive- parameter  bucket  brigade  forager  needs  to  be  more 
deeply  tested.  In  none  of  our  experiments  did  we  note  negative  marginal  performance;  the 
limits  of  the  algorithm  in  terms  of  scalability  remain  to  be  seen.  The  parameter  space  for 
the  adaptive  controller  —  the  possible  values  of  dR+  and  dR~  —  needs  to  be  explored  in 
greater  depth.  We  should  also  test  the  controller  in  more  topologically  complex  spaces,  such 
as  corridors,  as  did  [63],  and  investigate  why  the  adaptive  controller  does  not  outperform 
as  significantly  in  denser  environments. 

Briefly  mentioned  above  is  the  point  that  the  adaptive  approach  causes  a  net  increase 
in  the  number  of  a  priori  parameters:  new  parameters  added  are  the  rates  at  which  zones 
increase  and  decrease  in  size.  In  this  work,  those  parameters  were  set  experimentally.  Future 
work  may  provide  motivation  behind  values  of  these  parameters,  investigate  a  relationship 
between  optimal  values  for  dR+  and  dR~ ,  or  do  away  with  these  parameters  entirely. 

Future  work  may  also  formalize  a  closed-form  relationship  among  optimal  search  radius, 
puck  density,  and  robot  density.  Additionally,  we  have  briefly  touched  on  a  notion  of  entropy 
in  robots7  work  space  —  a  quantitative  function  of  the  environment  that  decreases  through 
useful  interactions  between  agents  and  the  environment  —  this  notion  should  be  more 
formally  analyzed. 


8.7  Conclusions 

To  summarize  the  contributions  of  this  paper:  we  replicated  the  results  of  [74],  confirming 
their  findings  with  an  independent  implementation.  Further,  we  propose  a  simple  modifi¬ 
cation  of  their  foraging  scheme  in  which  each  robot’s  foraging  area  is  adapted  in  response 
to  interference.  The  new  method  was  shown  to  improve  performance,  particularly  in  large 
population  sizes. 
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Chapter  9 

Massively  Multiple  Robot 
Simulations  in  Stage 


Author:  Richard  Vaughan 

This  chapter  is  based  on  a  paper  accepted  for  publication  by  the  journal  Swarm  Intelligence 
in  March  2008.  The  work  reported  was  not  funded  by  this  contract,  but  the  information 
here  is  relevant  to  the  project  so  it  is  included  for  convenience. 

9.1  Abstract 

Stage  is  a  C-\ — b  software  library  that  simulates  multiple  mobile  robots.  Stage  version  2, 
as  the  simulation  backend  for  the  Player/ Stage  system,  may  be  the  most  commonly  used 
robot  simulator  in  research  and  university  teaching  today.  Development  of  Stage  version  3 
has  focused  on  improving  scalability,  usability  and  portability.  This  paper  examines  Stage’s 
scalability. 

We  propose  a  simple  benchmark  for  multi- robot  simulator  performance,  and  present 
results  for  Stage.  Run  time  is  shown  to  scale  approximately  linearly  with  population  size 
up  to  2,000  robots.  For  example,  Stage  simulates  1  simple  robot  at  around  1,000  times  real 
time,  and  1,000  simple  robots  at  around  real  time.  These  results  suggest  that  Stage  may 
be  useful  for  swarm  robotics  researchers  who  would  otherwise  use  custom  simulators,  with 
their  attendant  disadvantages  in  terms  of  code  re-use  and  transparency. 

9.2  Introduction 

Stage  is  a  C++  software  library  that  simulates  multiple  mobile  robots.  Stage  version  2,  as 
the  simulation  backend  for  the  Player/ St  age  system,  may  be  the  most  commonly  used  robot 
simulator  in  research  and  university  teaching  today.  The  author  is  the  originator  and  lead 
developer  of  Stage.  The  development  of  Stage  version  3  has  focused  on  improving  scalability, 
usability  and  portability.  This  paper  examines  Stage’s  scalability. 

This  paper  is  the  first  description  of  using  Stage  for  massively  multi- robot  experiments, 
suitable  for  swarm  robotics  and  other  research  where  the  behaviour  of  large  robot  pop- 
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ul  at  ions  is  of  interest.  In  a  previous  paper  we  described  how  an  earlier  version  of  the 
Player/ Stage  system  was  useful  for  experimenting  with  a  few  tens  of  simulated  robots,  with 
controllers  transferring  easily  to  real  robots  [31] .  Here  we  focus  on  using  Stage  stand-alone 
(i.e.  without  Player)  as  a  platform  for  much  larger  robot  experiments.  To  help  researchers 
decide  whether  Stage  will  be  useful  for  their  project,  we  propose  a  simple  benchmark  for 
multi- robot  simulator  performance,  and  present  results  for  Stage  in  a  range  of  simulation 
scenarios  and  population  sizes.  This  paper  introduces  the  current  development  version  of 
Stage,  which  will  be  released  as  Stage  version  3.0.0  This  version  ha s  been  substantially 
rewritten  since  the  last  release  and  published  description.  While  it  provides  similar  func¬ 
tionality  to  version  2,  it  is  considerably  faster. 

Run  time  is  shown  below  to  scale  approximately  linearly  with  population  size  up  to  at 
least  2,000  simple  robots.  For  example,  Stage  simulates  1  simple  robot  at  around  1,000 
times  real  time,  and  1,000  simple  robots  at  around  real  time.  These  results  suggest  that 
Stage  may  be  newly  useful  for  swarm  robotics  researchers  who  would  otherwise  be  using 
in-house  custom  simulators,  with  their  attendant  disadvantages  in  terms  of  code  re-use  and 
transparency. 

9.2.1  Technical  and  methodological  features 

Stage’s  most  important  technical  features  are  (i)  its  cooperation  with  Player[34],  currently 
the  most  popular  robot  hardware  interface,  which  allows  offline  development  of  code  designed 
for  real  robots;  (ii)  it  is  relatively  easy  to  use,  yet  (iii)  it  is  realistic  enough  for  many 
purposes,  striking  a  useful  balance  between  fidelity  and  abstraction  that  is  different  from 
many  alternative  simulators;  (iv)  it  provides  models  of  many  of  the  common  robot  sensors; 
(v)  it  runs  on  Linux  and  other  Unix- like  platforms,  which  is  convenient  for  most  roboticists; 
and  (v)  it  supports  multiple  robots  sharing  a  world. 

Stage  also  has  some  very  important  non- technical  features:  (i)  Free  Software  license, 
the  GNU  General  Public  License  version  21;  (ii)  active  community  of  users  and  developers 
worldwide;  and  (ii)  status  as  a  well-known  platform.  The  positive  feedback  effect  of  these 
last  two  features  is  very  important,  as  it  creates  the  potential  to  improve  research  practice 
by  encouraging  replication  of  experiments  and  code  reuse. 

The  increased  performance  of  Stage  now  makes  it  potentially  interesting  to  researchers 
who  may  have  previously  rejected  using  a  general-purpose  simulator  due  to  lack  of  scala¬ 
bility.  Using  Stage  (or  a  derivation  of  it)  instead  of  building  a  custom  simulator  can  save 
researcher  time  and  money.  But  perhaps  more  important  is  the  methodological  advantage 
of  using  a  well-known,  free  and  open  platform,  with  open  source  code,  in  that  it  encour¬ 
ages  transparency,  replication  and  modification  of  experiments  -  a  vital  part  of  the  scientific 
method  that  is  uncommon  in  our  held  due  to  the  difficulty  and  cost  of  reimplementation. 

9.2.2  Scope  and  outline 

This  paper  is  not  intended  to  be  a  definitive  guide  to  Stage’s  internals,  nor  a  user  guide,  nor 
a  comprehensive  review  of  the  state  of  the  art  in  robot  simulation.  The  aim  is  to  concisely 
present  the  key  features  of  Stage  relevant  to  the  swarm  robotics  community,  including  the 
first  documented  performance  data  and  the  public  introduction  of  the  Stage  API. 

The  paper  proceeds  as  follows:  first  we  present  some  evidence  for  the  claims  made  in  this 
introduction.  Next  we  identify  the  current  alternatives  to  Stage.  Then  we  describe  the  key 

Tttp: //www. gnu. org/ licenses/ old- li censes/gpl- 2. O.ht ml 
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implementation  details  of  the  simulation  engine  that  are  relevant  to  large- population  perfor¬ 
mance.  Section  9.7  describes  a  minimal  and  prototypical  “swarm  robotics”  robot  controller 
and  set  of  simulation  environments  for  benchmarking  simulator  performance.  Section  9.8 
provides  benchmark  results  for  Stage,  followed  by  ideas  for  further  improving  performance 
and  new  features. 


9.3  Evidence  of  Stage  Impact 

It  is  not  practical  to  make  a  direct  estimate  of  Stage  installations  or  usage,  as  Stage’s 
Free  Software  license  allows  free  use  and  distribution.  Most  alternatives  to  Stage  have 
distribution  terms  that  make  their  use  similarly  uncountable,  so  we  can  not  know  their 
relative  popularity.  Yet  Stage  has  a  presence  at  the  major  research  conferences  and  on 
university  course  web  pages  that  suggests  it  is  widely  used  and  useful.  For  example,  at  least 
one  paper  using  Stage  for  multi- robot  simulations  has  appeared  at  ICR  A,  the  main  annual 
robotics  meeting,  every  year  since  its  debut  in  2001  [97,  29,  8,  73,  9,  55,  7,  19,  17].  Only 
the  first  of  these  papers  is  by  an  author  of  Stage. 

Player/ St  age  ha s  been  proposed  as  as  a  “unifying  paradigm  to  improve  robotics  educa¬ 
tion”  [2]  and  is  used  for  classes  at  many  universities  (by  professors  who  are  not  P/S  authors) 
including  Georgia  Tech’s  CS3630,  The  University  of  Delaware’s  CISC  849,  Washington  Uni¬ 
versity  at  St.  Louis’s  CSE  550 A,  Brown  University’s  CS  148,  the  University  of  Tenessee’s 
CS  594,  and  the  University  of  Southern  California’s  CSC  I  5842 

The  “official”  Stage  distribution  is  maintained  by  the  author  at  the  Player  Project  web¬ 
site,  hosted  at  SourceForge  3.  Sourceforge  tracks  download  statistics  for  each  package,  and 
the  download  history  for  Stage  is  shown  in  Figure  9.1.  Due  to  occasional  failures  of  the 
statistics  service,  these  numbers  are  (presumably  slight)  underestimates.  Stage  was  down¬ 
loaded  from  Sourceforge  at  least  22,121  times  between  December  2001  and  November  2007. 
Monthly  downloads  have  grown  steadily  to  a  current  rate  of  around  700  per  month. 

9.4  Related  Work 

A  good,  detailed  comparison  of  multi-robot  simulators  is  overdue.  A  recent,  flawed4  short 

survey  paper  discussed  some  popular  systems  available  in  2007.  [22] .  We  do  not  have  space 
for  a  survey  here,  but  the  reader  should  be  aware  of  the  following  influential  systems. 

•  TeamBots:5  This  simple  2D  Java- based  multi- robot  simulator  was  in  popular  use 
around  2000,  due  to  its  ease  of  use  and  free  distribution  terms,  and  the  ability  to 
run  the  same  code  in  simulation  and  on  real  robots.  One  of  the  original  Player/ St  age 
goals  was  to  reproduce  the  good  features  of  TeamBots  without  the  dependence  on 
Java.  TeamBots  is  still  available,  but  development  appears  to  have  stopped  in  2000 
[5], 

2 Web  pages  for  all  these  courses,  describing  assignments  using  Player/Stage,  were  available  online  at  the 
time  of  writing  (November  2007). 

3  http : //play or st age . sourceforge . net 

4 Stage  and  Gazebo,  two  of  the  best  known  simulators,  with  different  goals,  scope,  technology,  scaling 
characteristics,  authors  and  user  community,  are  conflated  into  one  score,  and  surprisingly  compared  with 
Microsoft  Flight  Simulator  [22]. 

5http : //www . teambots . org 
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Figure  9.1:  Stage  downloads  from  Sourceforge  by  month,  December  2003  to  November  2007 
(total  22,121). 


•  Gazebo:6  A  3D  simulator  with  dynamics  from  the  Player  Project,  Gazebo  is  based  on 
the  Open  Dynamics  Engine7.  Gazebo  has  become  popular  as  a  powerful,  high-fidelity 
simulator  that  works  with  Player  and  has  a  GNU  GPL  license.  However,  it  runs  slowly 
with  large  populations  (though  no  benchmarks  are  published)  [49] . 

•  USARSim:8  Using  the  “Unreal77  video  game  engine,  USARSim  is  a  GPL  licensed 
3- dimensional  simulator  that  is  similar  in  scope  to  Gazebo,  Webots  and  Microsoft 
Robotics  Studio.  An  important  contribution  of  USARSim  is  its  models  of  the  NIST 
reference  environments  used  for  the  RoboCup  Urban  Search  And  Rescue  competition. 
Compatible  with  Player  and  MO  AST  [71],  USARSim  is  under  active  development  and 
has  an  active  user  community.  USARSim  aims  for  accurate  dynamic  models,  and 
has  been  shown  to  support  large  worlds,  though  no  data  are  available  for  large  robot 
populations  [18]. 

•  Webots:  9  This  very  high-quality  commercial  simulator  focuses  on  accurate  dynam¬ 
ical  models  of  popular  robots.  It  is  fast  (though  no  benchmarks  are  published),  easy 
to  use  and  has  an  attractive  interface.  Webots  has  a  “Fast2DPlugin77  extension  that  is 
optimized  for  simple,  fast  simulations,  and  comes  with  detailed  models  of  robots  pop¬ 
ular  in  swarm  robotics  such  as  the  Alice,  K  hep  her  a  and  E-Puck  robots,  plus  RoboCup 
soccer  league  models.  Unfortunately  no  scaling  data  are  available,  and  Webots  is  not 
free  to  distribute  or  modify  [60]. 

•  Microsoft  Robotics  Studio:10  Released  in  early  2007,  Microsoft  Robotics  Stu¬ 
dio  is  functionally  very  similar  to  Player  and  Gazebo,  though  it  works  only  on  Mi¬ 
crosoft’s  Windows  platform.  Microsoft  is  promoting  and  financially  supporting  the 
use  of  Robotics  Studio  at  some  universities,  notably  Georgia  Institute  of  Technology 

6http : //play erst age . sourceforge . net/ index . php?src=gazebo 

rhttp : //www . ode . org 

8http : //usarsim . sourceforge . net 

9http : //www . cyberbot ics . com 

10http : //msdn2. microsof t . com/en- us/robot ics/ default . aspx 


61 


and  Bryn  Mawr  College  through  the  Insitute  for  Personal  Robots  in  Education  11 . 
Like  Gazebo  and  Webots,  Robotics  Studio  is  based  around  a  high-bdelity  dynamics 
engine.  Currently  no  examples  of  large  population  sizes  seem  to  be  available,  but  it 
is  likely,  provided  Microsoft  continues  to  support  the  project,  that  this  system  will  be 
increasingly  used  in  the  coming  years.  Robot  studio  is  free  to  use,  but  not  to  modify 
or  distribute,  and  the  source  code  is  not  publicly  available. 

*  Swarmbot3d  (S-bot  simulator):12  The  SWARM-BOT  project  and  s-bot  robot 
system  by  Mondada,  Dorigo  and  colleagues  is  a  highly  successful  and  influential  swarm 
robotics  project.  The  project  has  its  own  swarmbot3d  simulator,  which  necessarily 
includes  dynamics  as  the  s-bots  interact  with  rough  terrain  and  by  pulling  on  each 
other.  The  simulator  is  possibly  the  most  sophisticated  considered  here  and  is  based 
on  the  “Vortex”  commercial  physics  engine.  Swarmbot3d  is  described  in  detail  in  [62] , 
including  performance  data  for  models  at  various  levels  of  detail.  It  appears  to  run  at 
speeds  comparable  to  Stage,  and  appears  to  scale  approximately  linearly.  Population 
sizes  of  1  to  40  robots  are  reported.  While  Swarmbot3d  is  clearly  a  powerful  tool,  it 
does  not  appear  to  be  publicly  available. 


9.5  Stage  Internals  for  Scaling 

Stage’s  design  philosophy  has  been  described  previously  [31].  The  essential  design  feature 
that  enables  scaling  is  that  the  per-robot  computation  required  for  ray  tracing  -  used  for 
object  collision  detection  and  generating  range  sensor  data  -  is  independent  of  the  robot 
population  size .  As  ray  tracing  is  by  far  the  most  frequent  operation,  performed  possibly 
hundreds  of  times  per  robot  per  simulation  timestep,  we  can  expect  that  Stage’s  overall 
run-time  should  grow  almost  linearly  with  population  size. 

Another  performance  bottleneck  in  the  past  was  the  graphical  interface.  This  has  been 
reimplemented  using  OpenGL.  While  the  interface  has  become  more  sophisticated,  now 
presenting  3- dimensional  views,  it  has  also  gained  a  significant  performance  improvement 
due  to  highly  optimized  libraries,  drivers  and  hardware  acceleration. 

Here  we  outline  how  the  ray  tracing  population  independence  is  achieved,  and  introduce 
the  OpenGL  GUI. 

9.5.1  Physical  object  model 

Stage  models  physical  bodies  as  tree  assemblies  of  “blocks”.  Blocks  are  specified  as  ar¬ 
bitrary  polygons  on  the  [x,y]  plane,  extruded  into  the  2  axis.  The  block  data  structure 
consists  of  a  pointer  to  the  model  that  owns  the  block,  an  array  of  two-dimensional  points 
[[x,  y]o, .  . . ,  [x,  y]n],  the  array  length  n,  and  a  [^mm,  zmax]  pair  indicating  the  extent  of  the 
block  in  2.  This  implies  that  the  blocks  can  be  rotated  only  around  the  2  axis.  Because 
of  the  missing  degrees  of  freedom,  and  adopting  terminology  from  computer  graphics  and 
video  games,  we  describe  Stage  as  a  2.5D  (two and- a- half  dimensional)  simulator13.  Think 
of  blocks  as  Lego  bricks,  but  with  an  arbitrary  number  of  side  faces. 

Block  trees  can  be  constructed  piecemeal  by  user  code,  specified  in  the  worldhle  with 
arrays  of  points,  or  loaded  from  bitmaps  in  most  common  hie  formats  (JBEG,  BNG,  BMB, 

1 Tttp : //www . robot education. org/ 

1 2http :  / /www .  swarm-  bot  s .  org/ index .  php?main=3&sub=33 

13http: //en. Wikipedia. org/wiki/2. 5D 
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(a)  Pioneer  robot 


(b)  Simple  model 


(c)  Complex  model 


Figure  9,2:  Pioneer  S  DX  robot  from  MobileRobots  Inc,,  and  two  virtual  Stage  robots,,  each 
composed  of  a  StgModelPosition  object  (in  red)  and  a  StgModelLaser  object  (in  blue).  The 
simple  model  (b)  uses  2  blocks  described  by  16  numbers,,  and  requires  12  rays  for  collision 
detection.  The  complex  model  (c)  uses  11  blocks  described  by  98  numbers,,  and  requires  74 
rays  for  collision  detection.  These  are  our  reference  models  for  scaling  experiments, 

etc,).  When  interpreting  bitmaps,.  Stage  attempts  to  find  a  small  number  of  blocks  that 
occupy  the  same  grid  as  the  black  pixels  in  the  image. 

Figure  9,2  shows  two  models  supplied  with  Stage,,  both  approximating  the  popular  Pio¬ 
neer  SDX  robot.  We  compare  the  speed  of  Stage  simulations  using  these  models  below, 

9*5*2  Ray  tracing 

Collisions  between  blocks  and  range  sensor  data  are  computed  using  ray  tracing.  The  popu¬ 
lation  of  2,5D  blocks  is  rendered  into  a  2-dimensional  discrete  occupancy  grid  by  projecting 
their  shapes  onto  the  ground  plane  (z  =  0),  Grid  cell  size  is  configurable,,  with  a  default  size 
of  0,02m,  Each  grid  cell  contains  a  list  of  pointers  to  the  blocks  that  have  been  rendered 
into  that  cell.  When  a  block  moves,,  it  must  be  deleted  from  cells  that  it  no  longer  occupies 
and  rendered  into  newly-occupied  cells.  Non-moving  blocks  such  as  walls  are  rendered  into 
the  grid  only  once. 

The  ray  tracing  engine  uses  well-known  techniques,,  so  we  summarize  it  only  briefly.  For 
speed  and  memory  efficiency  we  use  a  two-level  sparse  nested  grid  data  structure.  Very 
fast  bit-manipulation  and  integer  arithmetic  are  used  to  look  up  grid  cell  locations  from 
simulation  world  coordinates  specified  in  meters,  Ray  tracing  involves  walking  through  the 
nested  grid  data  structure  using  Cohen’s  integer  line-drawing  algorithm  [89],  This  was  found 
empirically  to  be  faster  than  the  well-known  floating-point  alternative  by  Amanantides  V 
Woo  [1] ,  As  the  ray  visits  each  non-empty  cell,  we  inspect  the  2-extent  of  the  blocks  at  that 
cell,,  and  the  properties  of  the  models  owning  the  block,,  to  see  if  they  interact  with  the  ray. 

We  compared  quadtree  and  kd-tree  implementations  with  our  nested  grid,  and  found  the 
grid  to  run  considerably  faster,,  probably  due  to  better  memory  locality  and  thus  better  cache 
performance.  This  new  ray  tracing  engine  is  the  main  source  of  Stage’s  recent  performance 
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improvements,  and  efforts  to  further  improve  performance  should  focus  here. 

9.5.3  User  interface 

New  in  Stage  v3  us  a  user  interface  using  OpenGL  14  and  the  Fast  Light  Toolkit  framework 
(FLTK)  15 ,  chosen  for  speed,  ease  of  use,  and  wide  availability.  The  new  graphics  and  user 
interface  implementation  is  the  second  most  significant  performance  improvement  after  ray 
tracing.  The  OpenGL  interface  provides  a  full  3- dimensional  view  of  the  world,  and  alpha 
blending  and  antialiasing  provide  scope  for  sophisticated  effects  for  visualization  of  sensor 
data,  motion  history,  etc. 

Figures  9.4,  9.6  and  9.7  show  screenshots  from  Stage  demonstrating  the  new  3- dimensional 
view.  OpenGL  takes  advantage  of  graphics  processor  (GPU)  hardware,  and  we  plan  to  take 
further  advantage  of  the  GPU  and  also  to  design  more  advanced  sensor  and  robot  state 
visualizations  (see  Future  Work  section). 

9.6  Using  Stage  as  a  Library 

To  date,  Stage  is  most  commonly  used  with  Player,  to  form  the  Player/  St  age  system.  Robot 
controllers  are  written  as  Player  clients,  and  communicate  with  Player/Stage  through  net¬ 
work  sockets.  It  is  not  well  known  that  Stage  version  2  is  quite  usable  as  a  stand-alone  C 
library,  allowing  users  to  embed  a  complete  Stage  simulation  into  their  programs.  This  is 
done,  for  example,  in  MobileSim16,  the  simulator  shipped  by  MobileRobots,  Inc.  to  support 
their  range  of  robots.  MobileSim  author  Reed  Hedges  contributes  bugfixes  and  features 
back  to  the  Stage  library.  Until  this  paper,  we  have  not  publicised  this  way  of  using  Stage. 

However,  Player  clients  send  and  receive  data  asynchronously  with  Player  through  the 
host’s  networking  subsystem.  This  requires  significant  interaction  between  the  client  and 
server  processes  and  the  host  operating  system,  with  frequent  context  switches  and  likelihood 
of  messages  waiting  in  queues.  This  overhead  may  not  be  acceptable  when  attempting  to 
scale  robot  populations.  As  the  Player  client-server  link  is  asynchronous,  it  is  not  generally 
possible  to  run  repeatable  experiments.  Interaction  with  the  OS  scheduler  and  ambient 
processes  means  that  Player  clients  are  subject  to  apparently  stochastic  time  delays  and 
will  produce  stochastic  robot  behaviour.  Some  users  (including  the  author)  have  found  it 
difficult  to  maintain  good  enough  synchronization  between  client  and  Player/Stage  v.2  when 
the  host  machine  is  heavily  loaded,  for  example  by  a  large  Stage  simulation.  In  particular, 
the  latency  experienced  by  a  client  is  not  independent  of  robot  population  size,  leading  to 
poor  robot  behaviour.  Efforts  are  underway  to  solve  this  problem. 

For  large  scale  populations,  in  cases  where  the  Player  features  are  not  required,  we  now 
recommended  using  Stage  directly  as  a  library.  Stage  is  deterministic,  so  a  simulation  using 
only  deterministic  robot  controllers  will  be  perfectly  repeatable.  Sensor  data  and  control 
commands  are  contained  within  a  single  running  process,  avoiding  costly  context  switches 
and  exploiting  the  CPU  cache. 

Stage  version  3  is  now  a  C++  library,  which  can  be  used  to  create  a  complete  simulator 
by  creating  a  single  “StgWorld”  object  and  accessing  its  methods.  Rather  than  devote  a  lot 
of  text  to  this  subject,  we  provide  a  complete  minimal  instance  of  Stage  program  in  Figure 


1 4http : / /www . opengl . org 
15http : //www . f ltk . org 

1 6urlhtt  p :  /  /  robots  .mobilerob  ot  s .  com/  M  obileSim 
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# include  <stage,hh> 


int  main(  int  argc ,  char*  argv[]  ) 

i 

StgWorld : : Init (  &argc ,  &argv  );  //  initialize  libstage 

StgWorldGui  world(  800,  600,  "My  Stage  simulation"); 
world, Load(  argv[l]  );  //  load  the  world  file 

//  get  a  pointer  to  a  mobile  robot  model 

//  char  array  argv[2]  must  match  a  position  model  named  in  the  worldfile 
StgModelPosition*  robot  =  (StgModelPosit ion*) world. GetModelC  argv[2]  ); 
robot->Subscribe() ; 

world, Start () ;  //  start  the  simulation  clock  running 

while (  !  world. TestQuit ()  ) 
if(  world.RealTimeUpdateO  ) 

i 

ii  [  read  sensors  and  decide  actions  here  ] 
robot ->Do(  f orward_speed,  side_speed,  turn_speed  ) ; 

> 

delete  robot ; 
exit (  0  ) ; 


Figure  9.3:  Runnable  C++  source  code  for  a  minimal  but  complete  Stage  robot  simulation. 
Error  checking  is  omitted  for  brevity.  The  resulting  compiled  program  could  be  run  with 
the  worldfile  of  Figure  9.4  like  so:  ,/mystagesim  minimal  .world  MySimpleRobot.  The 
resulting  simulation  would  look  like  the  screenshot  in  Figure  9.5. 


9.3,  a  matching  world  description  hie  and  a  screenshot  of  the  resulting  world  in  Figure  9.4. 
This  should  be  sufficient  for  a  programmer  to  grasp  the  main  ideas.  Several  examples  are 
provided  in  the  Stage  distribution. 

9.7  A  Benchmark  for  Swarm  Simulation 

We  seek  to  obtain  empirical  run-time  data  for  massively  multi- robot  simulation  systems, 
in  order  to  evaluate  them  and  compare  their  performance.  Even  while  data  for  no  other 
simulator  is  available,  the  benchmark  would  allow  Stage  users  to  evaluate  the  effect  of  their 
code  additions  and  optimizations.  We  have  been  unable  to  find  a  suitable  benchmark  in  the 
literature,  so  we  propose  the  following  simple  procedure.  It  should  be  possible  to  reproduce 
this  procedure  very  closely  on  any  comparable  system. 

A  benchmark  robot  controller  and  world  description  are  required,  including  a  population 
of  robot  models.  For  simplicity,  run  the  identical  robot  controller  concurrently  on  every 
robot.  Start  the  simulation  and  measure  the  real  time  it  takes  to  simulate  a  fixed  amount 
of  simulated  time.  For  each  [  controller,  world  description  ]  pair,  obtain  the  run-time  for 
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#  Simulation  worlds  are  specified  using  a  simple  tree-structured 

#  configuration  file,  or  by  user  programs  making  individual  object 

#  creation  and  configuration  function  calls.  This  is  an  example  of  a 

#  simple  configuration  file 

#  size  of  the  world  in  meters  [x  y  z] 
size  [16  16  3] 

#  create  and  specify  an  environment  of  solid  walls  loaded  from  an  image 
model 

( 

#  size  of  the  model  in  meters  [x  y  z]  -  fill  the  whole  world 
size3  [16  16  0.5] 

color  Mgray30" 

#  draw  a  lm  grid  over  this  model 
gui.grid  1 

#  this  model  can  not  be  moved  around  with  the  mouse 
gui_movemask  0 

#  interpret  this  image  file  as  an  occupamcy  grid  and 

#  construct  a  set  of  blocks  that  fill  this  grid 
bitmap  " bitmaps/ cave .png" 

#  model  has  a  solid  boundary  wall  surrounding  the  blocks 
boundary  1 

) 

#  create  and  specify  a  mobile  robot  carrying  a  laser  scanner 
position 

( 

name  "MySimpleRobot " 

color  "red" 

size3  [0.5  0.5  0.4] 

#  pose  of  the  model  in  the  world  [x  y  heading] 
pose  [-2.551  -0.281  -39.206] 

laser () 

) 

Figure  9.4:  Simple  example  of  a  Stage  configuration  file  (“world  file”).  A  screenshot  of  the 
world  it  produces  is  shown  in  Figure  9.5 


66 


Figure  9.5:  Screen  shot  of  the  world  created  by  the  worldfile  shown  in  Figure  9.4. 


a  single  robot,  then  a  series  of  exponentially  increasing  population  sizes.  For  simulators  or 
controllers  with  stochastic  behaviour,  repeat  each  experiment  to  sample  the  distribution  of 
run  times. 

9*7.1  Benchmark  robot  controller 

It  is  unlikely  that  any  single  robot  controller  will  satisfy  every  user,  and  any  “real”  controller 
that  performs  a  particular  useful  swarm  behaviour  from  the  literature  could  be  argued  to 
be  too  specialized.  We  prefer  a  controller  that  uses  very  little  computation  and  memory 
compared  to  the  simulator,  to  avoid  controller  costs  masking  the  simulator  performance.  Yet 
we  want  to  see  robot  behaviour  which  is  somewhat  representative  of  real  swarm  robotics 
research. 

A  canonical  application  for  multi-robot  systems,  with  approaches  ranging  from  swarm 
intelligence  to  centralized  planning,  is  wireless  sensor  network  deployment  and  maintenance 
([96,  44,  20]  plus  several  ICRA  articles  cited  above)  and  the  subject  of  past  and  present 
DARPA  projects.  Multi-robot  deployment  overlaps  with  other  canonical  tasks  such  as 
exploration  and  foraging,  though  it  is  not  a  good  model  of  trail- following  or  cooperative 
manipulation. 

A  minimal  version  of  the  deployment  problem  is  dispersal  into  the  environment,  which 
can  be  achieved  using  a  mobile  robot  with  an  array  of  directional  range  sensors  using  the 
following  trivial  algorithm: 

1.  gather  array  of  (range,  direction)  vectors 

2.  compute  vector  sum  S  of  array 

3.  IF(  (heading  —  ZS  &  0  )  AND  (  free  space  ahead  )) 

THEN  (  go  forwards  ) 

ELSE  (  turn  to  reduce  heading  error  ) 

4.  GOTO  1 
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With  suitable  parameter  choices  depending  on  robot  speed,  number  of  sensors,  maximum 
obstacle  detection  ranges,  etc.,  this  algorithm  is  highly  effective  at  dispersing  the  robots  until 
each  robot  has  no  object  inside  its  sensor  range.  The  algorithm  is  a  simple  approximation 
of  Howard’s  approach  [43],  is  trivial  to  implement,  uses  a  single,  commonly  available  class 
of  sensor,  and  has  the  advantages  of  being  stateless,  thus  requiring  no  per- robot  storage 
between  updates,  and  deterministic,  avoiding  the  need  for  repeated  trials. 

Dispersal  presents  a  worst- case  scenario  for  grid- based  simulations  like  Stage,  as  the 
amount  of  free  space  through  which  rays  must  propagate  is  maximized.  Geometric  simula¬ 
tions  and  those  that  track  object  bounding  boxes  may  have  an  easier  time,  as  the  frequency 
of  bounding  box  overlaps  is  minimized.  If  it  is  desirable  to  test  the  simulation  performance 
for  tightly  clustered  robots,  the  controller  can  trivially  be  inverted  to  produce  local  cluster¬ 
ing. 

We  use  this  controller  for  the  following  benchmarking  experiments,  and  hope  that  it  may 
be  useful  for  others  or  inspire  (or  provoke)  the  design  of  a  superior  method. 

9.8  Stage  Benchmarks 

We  present  benchmark  results  for  Stage  using  multiple  simulation  scenarios.  It  is  impractical 
to  exhaustively  explore  the  huge  space  of  Stage  configurations,  but  our  choices  span  a  range 
intended  to  be  of  interest  to  different  types  of  user.  The  configurations  described  here 
should  be  straightforward  to  reproduce  on  most  alternative  simulators,  and  the  hies  used  to 
implement  them  will  be  included  with  the  Stage  v3  distribution. 

9.8.1  Ray  Tracing  resolution 

The  spatial  resolution  of  the  underlying  ray  tracing  engine  was  fixed  for  all  tests  at  0.02m. 
This  is  Stage’s  default  setting,  and  an  informal  survey  of  users  suggest  that  few  people  ever 
change  this  parameter.  This  means  that  range  sensors  and  collision  detection  is  accurate  to 
this  value.  As  discussed  above,  Stage  contains  no  noise  models  for  range  data,  so  ranges  are 
effectively  quantized  at  0.02m. 

9.8.2  Environments 

We  use  two  different  environments  created  by  the  author,  included  for  several  years  in 
the  Stage  distribution  and  used  in  papers  by  multiple  researchers:  the  “cave  world”  and 
the  “hospital  world”.  The  cave  was  drawn  by  hand  in  1998,  and  models  a  moderately 
constrained  environment,  either  a  crude  artificial  structure  or  a  regular  natural  structure. 
The  hospital  was  created  in  1999  by  hand- editing  a  CAD  drawing  of  the  floor  plan  of  the 
disused  military  hospital  at  Fort  Sam  Houston,  San  Antonio,  Texas,  site  of  previous  DARPA 
robot  demonstrations.  The  bottom-left  hand  corner  of  the  hospital  world  is  popularly  used 
as  a  generic  indoor  environment,  and  is  referred  to  as  the  “hospital  section”. 

Once  loaded  from  their  original  bitmap  hies,  the  cave  consists  of  489  blocks  and  is  scaled 
to  fill  a  16m  by  16m  world.  The  hospital  consists  of  5744  blocks,  scaled  approximately  life 
size  140m  by  60m.  All  blocks  are  rectangular  polyhedra,  with  edges  aligned  with  the  global 
axes.  The  cave  and  hospital  maps  are  shown  in  Figure  9.6  and  9.7  respectively. 
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9.8.3  Mobile  robot  models 

We  use  the  two  mobile  robot  models  shown  in  Figure  9.2,  similar  but  for  the  number  and 
complexity  of  the  blocks  that  make  up  their  bodies.  They  are  0.44m  long  by  0.38m  wide  by 
0.22m  high.  Compared  to  the  octagonal  “simple”  robot  model  the  “complex”  robot  model 
requires  6  times  as  many  ray  tracing  operations  to  detect  collisions  (or  lack  of  collisions) 
at  each  simulation  update.  It  is  also  more  expensive  to  draw  in  the  GUI  window,  though 
the  relative  costs  are  hard  to  estimate  without  detailed  knowledge  of  the  specific  OpenGL 
and  rendering  implementation.  A  third  model,  the  “swarmbot”  is  used  for  larger  scale 
experiments,  and  is  similar  in  shape  to  the  “simple”  model,  but  is  smaller,  occupying  a  cube 
with  sides  of  0.2m. 

9.8.4  Sensor  models 

The  “simple”  and  “complex”  mobile  robot  models  are  identically  equipped  with  generic 
Stage  range  finder  models,  16  sensors  distributed  around  the  robot’s  boundary,  each  com¬ 
puted  by  casting  a  single  ray  through  free- space  until  a  sonar- reflecting  object  is  struck. 
Their  maximum  range  matches  the  Pioneer  robot’s  sonar  sensors  at  5m. 

The  “swarmbot”  has  12  range  finders  spaced  evenly  around  its  body,  with  a  maximum 
range  of  2m.  This  models  a  long-range  infared  range- finder  or  a  short  range  sonar,  such  as 
is  typically  found  on  small  robots  designed  for  swarm  experiments. 

In  addition,  we  test  some  scenarios  with  a  model  of  a  scanning  laser  rangefinder,  pa¬ 
rameterized  to  approximate  the  popular  SICK  LMS  200  device,  except  that  it  gathers  180 
samples  over  its  180  degree  held  of  view,  instead  of  the  361  samples  of  the  real  device.  The 
laser  model  has  a  maximum  range  of  8m:  the  default  on  the  real  SICK.  Due  to  its  long 
range  and  large  number  of  samples,  the  laser  range  model  is  much  more  computationally 
expensive  than  the  sonar/infrared  model. 

9.8.5  Graphics 

To  examine  the  performance  impact  of  the  graphics  rendering  and  user  interface,  the  exper¬ 
iments  were  performed  with  and  without  the  user  interface.  With  the  interface  enabled,  the 
window  is  repainted  at  the  default  rate  of  100ms.  We  expect  that  Stage  should  run  faster 
with  the  interface  disabled. 

9.8.6  Host  computer 

The  host  computer  was  an  Apple  MacBook  Pro,  with  a  2.33GHz  Intel  Core  2  Duo  processor 
and  2GB  RAM,  with  a  an  ATI  Radeon  XI 600  graphics  chipset  with  256MB  VRAM,  running 
Mac  OS  X  10.5.1.  At  the  time  of  writing  this  is  a  high-end  laptop,  equivalent  to  a  mid-range 
desktop  PC. 

9.8.7  Results 

The  benchmark  robot  controller  was  run  in  the  cave  world  for  all  8  permutations  of  [  model 
type,  laser  enabled/ disabled  and  graphics  enabled/ disabled  ].  Each  permutation  was  run  for 
600  seconds  with  population  sizes  of  1,  10,  50  and  100  robots  and  the  real-world  run  time 
recorded  in  Table  1.  The  success  of  the  robot- dispersal  algorithm  can  be  seen  in  Figure  9.6, 
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Figure  9.6:  The  Cave  world  with  100  complex  Pioneer  models  with  lasers  in  the  initial  state 
(top  left);  after  600  simulated  seconds  (top  right);  and  a  3D  view  of  the  center  of  the  world 
at  60  seconds  (bottom). 
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Figure  9,7:  The  Hospital  world  with  2000  swarmbots  in  its  initial  state  (top)  (robots  are 
tightly  packed  and  appear  as  a  dark  horizontal  band);  after  600  simulated  seconds  (middle); 
and  a  SD  view  of  the  center  of  the  world  at  60  seconds  (bottom). 
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Figure  9.8:  Summary  of  benchmark  results.  The  ratio  of  simulation  time  to  real  time  is 
plotted  against  population  size  on  a  log/log  scale,  for  each  world  configuration.  Anything 
above  the  x  —  1  line  is  running  faster  than  real  time.  The  run-time  performance  scales 
approximately  linearly  with  population  size  in  the  range  tested. 
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Model  type 

Laser 

Graphics 

Run  time  for  population  size 

1  10  50  100 

Simple 

- 

- 

0.55 

4.98 

to 

Li 

00 

46.23 

77 

- 

yes 

1.21 

5.2 

26.69 

50.44 

77 

yes 

- 

4.06 

42.70 

229.48 

449.44 

77 

yes 

yes 

4.83 

45.71 

243.28 

476.34 

Complex 

- 

- 

1.23 

12.03 

59.15 

116.69 

77 

- 

yes 

1.71 

13.35 

65.71 

137.15 

77 

yes 

- 

5.21 

53.61 

to 

00 

-1 

550.11 

77 

yes 

yes 

6.05 

58.44 

315.83 

659.66 

Table  9.1:  Cave  world  Results:  real-world  run  time  for  600  seconds  of  simulation  time,  for 
the  Cave  world  in  40  different  configurations.  Cave  world  is  shown  in  Figure  9.6.  Simple 
and  Complex  models  are  shown  in  Figure  9.2 


Model  type  Graphics 

Run  time  for  population  size 

1  10  100  1000  2000 

Swarmbot 

Swarmbot  yes 

0.63  3.41  31/13  325.77  674.00 
1.70  8.00  73.87  546.67  935.41 

Table  9.2:  Hospital  results:  real-world  run  time  for  600  seconds  of  simulation  time,  for  the 
Hospital  world  in  40  different  configurations.  Hospital  world  and  swarmbot  model  are  shown 
in  Figure  9.7. 


which  shows  a  global  view  of  the  Cave  world  with  100  complex  robots  at  zero  seconds  and 
600  seconds,  and  an  intermediate  3D  view  of  the  center  of  the  world  at  60  seconds. 

The  benchmark  robot  controller  was  also  run  for  600  seconds  in  the  hospital  world,  with 
graphics  enabled  and  disabled,  for  population  sizes  of  1,  10,  100,  1000  and  2000  robots.  The 
real  world  run  times  are  recorded  in  Table  2.  The  success  of  the  robot-dispersal  controller 
can  be  seen  again  in  Figure  9.7,  where  2000  swarmbot s  are  shown  in  the  hospital  world  at 
zero  and  600  seconds,  and  in  an  intermediate  3D  view  at  60  seconds. 

We  see  that,  as  expected,  larger  population  sizes  take  longer  to  run.  Use  of  the  complex 
model  and  the  laser  scanner  also  increased  run  time,  as  we  would  expect  from  their  ray 
tracing  demands. 

To  compare  the  run  time  data  between  trials,  we  plot  the  ratio  of  simulated  to  real  time 
(i.e.  multiples  of  real  time  (MRT),  or  “speed-up  factor”)  in  Figure  9.8.  The  population  size  is 
plotted  against  MRT  on  a  log/log  scale,  and  similar  scenarios,  differing  only  in  population 
size,  are  connected  with  lines  to  form  a  scaling  curve.  We  see  that  all  the  curves  are 
approximately  linear,  indicating  that  Stage  run  time  does  increase  linearly  with  population 
size  as  hoped.  We  also  see  that  all  but  the  three  most  demanding  simulations  ran  faster 
than  real  time. 

To  summarize  the  results,  these  data  show  that  with  a  trivial  robot  controller  Stage  can 
simulate  small  populations  of  complex  robots  much  faster  than  real  time.  Populations  of 
around  100  complex  robots  or  1,000  simple  robots  can  run  in  real  time  or  less. 
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9.9  Future  Work 

9.9.1  Performance  and  Scaling 

We  are  interested  in  running  experiments  with  populations  one  to  two  orders  of  magnitude 
larger  than  we  have  considered  in  this  paper.  Populations  of  10,000  to  100,000  robots  are 
in  the  regime  of  interesting  natural  swarms.  To  achieve  the  required  further  10  to  100  times 
increase  in  performance,  without  fundamentally  changing  our  simulation  model,  we  have 
three  main  areas  of  investigation,  listed  in  reverse  order  of  potential  scaling  benefit: 

Optimize  code  efficiency 

There  is  certainly  scope  for  improvement  in  the  performance  of  Stage’s  frequently  run  inner 
loops  -  mainly  ray  tracing  and  graphics  rendering.  In  addition,  to  date  little  attention  has 
been  paid  to  per- robot  memory  size  and  layout.  More  careful  memory  use  could  further 
improve  cache  hit  rates,  and  allow  larger  populations  before  exceeding  a  host  machine’s 
physical  memory. 

Concurrency  -  threads 

Multi- core  CPUs  and  multi-CPU  machines  have  become  common.  In  principle,  Stage  can 
benefit  from  parallel  computation.  Sections  of  the  world  that  do  not  interact  with  each 
other  can  run  concurrently  without  any  synchronization  overhead.  Thus  we  could  in  prin¬ 
ciple  obtain  a  near  Af-times  speed  increase  with  N  additional  processor  cores.  Currently 
fast  machines  with  8  cores  are  available.  The  difficulty  comes  in  maintaining  suitable  par¬ 
titions,  and  finding  linear  time  or  better  algorithms  for  partitioning  physical  simulations  is 
an  interesting  area  for  research. 

Concurrency  -  cluster  computing 

The  greatest  opportunity  for  massive  scaling  comes  with  cluster  computing,  in  which  many 
hosts  can  be  pooled  into  a  distributed  system  (e.g.  Google’s  huge  systems  [6]).  Distributing 
Stage  over  a  cluster  is  a  similar  problem  to  using  threads,  but  with  much  greater  b etwee n- 
partition  communication  costs,  and  the  addition  of  reliability  issues  that  can  usually  be 
ignored  on  single  machines.  We  will  investigate  a  distributed  Stage. 

9.9.2  Other  plans  relevant  to  swarm  robotics 

Multi-robot  visualization  tools 

We  would  like  tools  for  analysing,  debugging  and  presenting  the  state  evolution  of  large-scale 
multi-robot  systems.  We  wish  to  visualize  the  external  and  internal  state  of  thousands  of 
agents;  to  obtain  insight  into  important  events,  causal  relationships,  patterns  and  problems. 
Exploiting  the  new  OpenGL  view  of  the  world,  we  will  gradually  add  visualizations  to  the 
main  distribution.  For  example,  Stage-3.0.0  provides  various  views  of  the  robot’s  movement 
history. 
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Standard  API 


Stage  currently  offers  a  custom  API  for  creating  simulations  and  writing  robot  controllers. 
It  would  be  more  useful  if  user’s  robot  controllers  were  directly  portable  to  Player,  and  other 
robot  interfaces  or  simulators.  The  Player  Project  has  begun  work  on  creating  a  portable 
robot  controller  API,  based  on  the  Player  Abstract  Device  Interface,  to  be  implemented 
initially  in  Player  and  Stage.  This  is  a  long  term  community  effort. 

Portability 

Stage  v3  is  designed  to  be  portable.  It  currently  runs  on  all  major  systems  except  Windows. 
We  aim  to  have  Stage  build  and  run  on  Windows  as  standard  by  the  middle  of  2008.  The 
main  goal  of  this  effort  is  to  make  Stage  more  accessible  for  students. 

9.10  Conclusion 

We  have  argued  that  Stage  may  be  useful  to  swarm  robotics  researchers  because  (i)  it 
supports  large  numbers  of  robots  at  real  time  or  better;  and  (ii)  it  is  well  known  in  the 
multi- robot  systems  community,  and  therefore  has  methodological  advantages  compared  to 
a  dedicated  simulator. 

To  demonstrate  the  new  performance  of  Stage  version  3,  we  presented  and  used  a  simple 
benchmark  for  mobile  robot  simulators.  Stage’s  run  time  was  shown  to  scale  approximately 
linearly  with  population  size  up  to  100  computationally  demanding  robots  or  2,000  simple 
robots. 

Stage  is  not  useful  for  every  project,  but  version  2  has  proved  itself  useful  to  many.  Stage 
version  3  adds  a  significant  performance  increase  to  this  open,  free  and  well-known  platform. 

9.11  Supporting  files 

All  source  code  and  configuration  hies  required  to  reproduce  these  benchmarks  are  included 
in  Stage  source  code  distributions  since  Stage-3.0.0,  in  the  directory  worlds/benchmark. 
The  configuration  hies  are  either  human-readable  text  hies  or  PNG  format  bitmaps.  These 
hies  and  this  paper  should  be  sufficient  information  to  adapt  the  benchmark  for  another 
simulator. 
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